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Abstract 

The  Air  Force  Institute  of  Technology’s  Advanced  Navigation  Technology  center 
has  invested  significant  research  time  and  effort  into  alternative  precision  navigation 
methods  in  an  effort  to  counteract  the  increasing  dependency  on  Global  Positioning 
System  for  precision  navigation.  Such  alternative  navigation  methods  have  become 
particularly  useful  in  environments  where  the  required  direct  line  of  sight  to  the  satel¬ 
lite  constellation  is  not  available  or  when  the  enemy  is  purposely  denying  access  to 
the  GPS  signal.  The  use  of  visual  sensors  and  feature  tracking  has  since  emerged 
as  a  valuable  and  feasible  precision  navigation  alternative  which,  when  coupled  with 
inertial  navigation  sensors  can  reduce  navigation  estimation  errors  by  approximately 
two  orders  of  magnitude  [22],  Although  the  basic  mathematics  and  algorithms  have 
been  thoroughly  documented,  image-aided  navigation  is  still  in  its  early  stages.  This 
research  aims  to  improve  two  key  steps  within  the  image-aided  navigation  process: 
camera  calibration  and  landmark  tracking.  The  camera  calibration  step  is  improved 
by  automating  the  point  correspondence  calculation  within  the  standard  camera  cali¬ 
bration  algorithm,  thereby  reducing  the  required  time  for  calibration  while  maintain¬ 
ing  the  output  model  accuracy.  The  landmark  tracking  step  is  improved  by  digitally 
simulating  affine  distortions  on  input  images  in  order  to  calculate  more  accurate  fea¬ 
ture  descriptors  for  improved  matching  through  large  changes  in  relative  viewpoint. 
These  techniques  are  experimentally  demonstrated  in  an  outdoor  environment  with  a 
consumer-grade  inertial  sensor  and  three  imaging  sensors,  one  of  which  is  orthogonal 
to  the  others.  Using  a  tactical-grade  inertial  sensor  coupled  with  GPS  position  data  as 
the  truth  source,  the  improved  image-aided  navigation  algorithm  is  shown  to  reduce 
navigation  errors  by  24%  in  position,  16%  in  velocity,  and  35%  in  attitude  compared 
to  the  standard  two-camera  image-aided  navigation  setup. 
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Enhanced  Image-aided  Navigation  Algorithm 
with  Automatic  Calibration 
and  Affine  Distortion  Prediction 

I.  Introduction 

The  success  of  the  United  States  Air  Force  mission  is  dependent  on  the  availabil¬ 
ity  of  precision  navigation  information.  This  need  is  currently  fulfilled  by  the 
combination  of  inertial  sensors  and  position  information  from  the  Global  Position¬ 
ing  System  (GPS).  Without  GPS,  inertial  sensors  alone  can  only  provide  a  reliable 
solution  for  a  short  term  before  inertial  drift-induced  uncertainty  exceeds  acceptable 
limits.  The  absence  of  an  alternate  inertial  error-constraining  technology  has  cre¬ 
ated  a  dependency  on  GPS,  which  is  vulnerable  to  disruption  from  urban  line-of-sight 
occlusion  or  intentional  enemy  jamming.  The  Chief  of  Staff  of  the  Air  Force  has 
addressed  this  issue  by  stating,  “It  seems  critical  to  me  that  the  Joint  Force  should 
reduce  its  dependence  on  GPS-aided  precision  navigation  and  timing,  allowing  it  to 
ultimately  become  less  vulnerable,  yet  equally  precise,  and  more  resilient”  [17]. 

1 . 1  Current  Technology 

One  of  the  major  research  thrusts  emerging  from  the  need  to  decrease  depen¬ 
dency  on  GPS  is  image-aided  navigation.  The  concept  of  image-aided  navigation 
involves  augmenting  inertial  sensors  with  imaging  sensors  in  order  to  track  visual 
landmarks.  Landmark  tracking  coupled  with  inertial  sensor  measurements  has  been 
shown  to  produce  stable  navigation  solutions  that  are  nearly  two  orders  of  magnitude 
more  accurate  than  inertial  sensors  alone  [22] .  This  research  focuses  on  improving  two 
aspects  within  image-aided  navigation:  camera  calibration  and  landmark  tracking. 

1.1.1  Camera  Calibration.  The  current  tool  of  choice  for  camera  calibration 
is  the  Camera  Calibration  Toolbox  from  Caltech  [1],  The  algorithms  used  inside 
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the  tool  are  heavily  based  on  Dr.  Zhang’s  camera  calibration  work  [24]  and  further 
explained  in  the  next  chapter.  Currently,  a  standard  two-camera  calibration  procedure 
is  as  follows: 

1.  First,  the  cameras  are  rigidly  mounted  to  a  camera  bar. 

2.  Next,  a  planar  calibration  surface  such  as  the  one  illustrated  in  Figure  1.1  is 
held  still  while  both  cameras  capture  an  image. 

3.  The  planar  surface  is  then  moved  into  a  different  pose  and  the  cameras  capture 
another  image. 

4.  This  process  is  repeated  until  a  sufficient  number  poses  have  been  captured 
(usually  around  10). 

5.  Next,  the  user  must  manually  step  through  each  image  for  each  camera  and 
click  on  the  outer  four  corners  of  calibration  surface  as  shown  in  Figure  1.2. 

6.  If  the  calibration  surface  is  not  fully  visible  within  an  image,  such  image  must 
be  discarded. 

7.  The  calibration  toolbox  uses  the  corner  mapping  from  each  image  to  calculate 
intrinsic  and  extrinsic  parameters  for  each  camera. 

8.  Lastly,  the  calibration  toolbox  produces  the  relative  translation  and  rotation 
between  the  two  cameras  and  calibration  is  complete. 

1.1.2  Landmark  Tracking.  The  current  image-aided  navigation  algorithm 
employed  at  the  Advanced  Navigation  Technology  (ANT)  center  uses  stochastic- 
constrained  search  areas  derived  from  inertial  measurements  to  ford  and  track  fea¬ 
tures  from  one  discrete  time  to  the  next.  A  successful  feature  match  depends  on  the 
Euclidean  distance  between  the  128-vector  descriptor  of  a  feature  and  its  candidate 
match.  Figure  1.3  illustrates  successful  (green)  and  unsuccessful  matches  (yellow) 
over  consecutive  time  steps.  Common  causes  for  unsuccessful  matches  include  fea¬ 
tures  leaving  the  image  plane  between  discrete  time  steps  and  large  affine  distortions 
arising  from  either  vehicle  movement  or  camera  selection. 
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Calibration  images 


Figure  1.1:  Traditional  planar  calibration  surface.  The  chessboard  pattern 
printed  on  the  planar  surface  allows  for  practical  feature  mapping  from  simple 
corner  detection. 

1.2  Problem  Formulation 

The  problem  addressed  in  this  research  is  composed  of  two  parts.  First,  the 
current  camera  calibration  procedure  is  unnecessarily  manual  and  therefore  slow  and 
prone  to  user-induced  error.  The  first  goal  is  to  automate  the  entire  camera  calibra¬ 
tion  process  by  using  a  different  planar  calibration  surface,  one  with  distinct  features 
that  can  be  automatically  recognized  and  matched  by  a  computer  without  user  in¬ 
teraction  (eliminating  the  need  to  click  on  the  outer  four  corners  of  the  chessboard). 
Second,  the  current  landmark  tracking  algorithm  does  not  account  for  affine  distor¬ 
tions  arising  from  drastic  changes  in  vehicle  position  and  camera  selection,  such  as 
when  a  feature  tracked  by  a  forward  camera  moves  into  the  field  of  view  of  a  side 
camera.  Therefore,  the  second  goal  is  to  develop  an  affine  distortion  prediction  algo¬ 
rithm  that  preemptively  transforms  tracked  features  using  inertial  measurements  in 
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Figure  1.2:  Outer  corner  delineation.  The  user  must  individually  click  on  the 
outer  four  corners  of  each  image  to  identify  the  boundaries  of  the  calibration 
surface. 

order  to  improve  feature  matching  success  rate  during  extended  periods  of  navigation 
and  over  orthogonal  cameras.  An  increased  feature  matching  success  rate  will  directly 
improve  the  navigation  solution  by  reducing  error. 

1.3  Research  Contributions 

The  primary  contribution  of  this  research  is  an  enhanced  image-aided  naviga¬ 
tion  algorithm  that  includes  a  deeply-coupled  image  and  inertial  navigation  solution 
with  predictive  affine  distortion  modeling.  Additionally,  this  research  delivers  a  fully 
automated  camera  calibration  algorithm  that  enables  the  calibration  of  cameras  with 
minimal  overlapping  between  fields  of  view.  The  algorithms  described  herein  generate 
a  stable  navigation  solution  accurate  to  within  1.5%  (of  distance  traveled)  in  posi¬ 
tion,  1  meter  per  second  in  velocity  and  4  degrees  in  attitude  using  a  consumer-grade 
inertial  sensor  and  three  consumer-grade  cameras. 


4 


Figure  1.3:  Feature  matching  illustration.  Successful  matches  from  tt  to  tl+i 
are  denoted  by  green  search  areas  while  unsuccessful  matches  are  denoted  by 
yellow  search  areas. 

1.4  Outline 

The  remainder  of  this  thesis  is  divided  into  four  additional  chapters.  Chap¬ 
ter  II  discusses  the  background  topics  needed  to  build  a  foundational  understanding 
of  inertial  navigation,  imaging  sensors,  camera  calibration,  image-aided  navigation 
and  feature  generation  and  matching.  Chapter  III  outlines  the  methods  used  to  de¬ 
velop  the  automatic  calibration  and  affine  distortion  prediction  algorithms  as  well  as 
the  experimental  procedures  used  to  evaluate  them.  The  experimental  results  are 
presented  and  analyzed  in  Chapter  IV,  and  finally,  conclusions  and  suggestions  for 
further  development  are  made  in  Chapter  V. 
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II.  Background 


This  chapter  outlines  the  major  concepts  needed  to  implement  the  automatic  cal¬ 
ibration  and  affine  distortion  prediction  algorithms  described  in  this  thesis  and 
to  understand  their  importance.  The  notational  conventions  used  throughout  this 
thesis  are  presented  in  Section  2.1.  A  brief  overview  on  the  use  of  reference  frames 
for  navigation  is  given  in  Section  2.2.  Section  2.3  overviews  the  basic  concepts  in 
inertial  navigation.  Notable  digital  signal  processing  techniques  used  in  this  research 
are  discussed  in  Section  2.4.  Camera  models  and  camera  calibration  techniques  are 
summarized  in  Section  2.5.  Sections  2.6  and  2.7  cover  the  basic  concepts  in  Kalman 
filtering  and  image-aided  navigation  as  well  as  their  associated  limitations.  The  ma¬ 
jor  contributions  from  previous  research  efforts  are  briefly  discussed  throughout  the 
appropriate  sections. 

2.1  Mathematical  Notation 

The  quantities  expressed  in  equations,  figures,  tables  and  text  throughout  this 
research  adhere  to  the  following  conventions: 

Scalars:  Scalars  are  represented  by  either  upper  or  lowercase  characters  in  italics, 
e.g.,  a  or  A. 

Vectors:  Vector  quantities  are  represented  by  lowercase  characters  in  bold,  e.g.,  a. 
Unless  specifically  stated  otherwise,  all  vectors  should  be  interpreted  as  column 
vectors. 

Vector  Components:  The  scalar  components  of  a  vector  are  represented  with  sub¬ 
scripts  indicating  their  corresponding  axes,  e.g.,  the  ^-component  of  the  vector 
a  is  represented  as  ax. 

Homogeneous  Vectors:  Homogeneous  vectors  are  built  by  augmenting  standard 
vectors  with  an  additional  component  equal  to  1  and  are  represented  with  an 
underscore,  e.g.,  a.  The  use  of  homogeneous  vectors  is  further  discussed  in 
Section  2.2. 1.3. 
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Matrices:  Matrices  are  represented  by  uppercase  characters  in  bold,  e.g.,  A  or 

Estimated  Variables:  Variables  that  represent  an  estimate  of  a  particular  quantity 
are  represented  with  the  “hat”  accent,  e.g.,  a. 

Direction  Cosine  Matrices:  Direction  cosine  matrices  representing  a  rotation  from 
frame  a  to  frame  b  are  denoted  by  C^. 

Reference  Frame:  If  a  vector  is  expressed  in  a  specific  reference  frame,  a  superscript 
letter  is  used  to  designate  the  reference  frame,  e.g.,  pa  is  a  vector  in  the  a  frame. 

Relative  Position  or  Motion:  In  cases  where  is  it  important  to  specify  relative 
motion,  combined  subscript  letters  are  used  to  designate  the  frames,  e.g.,  u>ab 
represents  the  angular  rate  vector  from  frame  a  to  frame  b. 

A  Priori  and  A  Posteriori  Estimates:  When  describing  the  operation  of  a  Kalman 

filter  algorithm,  it  is  necessary  to  distinguish  between  estimates  computed  be¬ 
fore  (a  priori )  or  after  (a  posteriori)  a  measurement  update.  In  such  instances, 
a  “minus”  character  superscript  is  added  to  the  variable  for  a  priori  estimates 
while  a  “plus”  character  superscript  is  added  to  a  posteriori  estimates,  e.g., 
a (t~)  or  a(t+). 

2. 2  Reference  Frames 

Navigation  reference  frames  are  fundamentally  important  when  expressing  po¬ 
sition,  velocity,  and  orientation  of  a  body.  For  this  research,  the  following  reference 
frames  are  defined  based  on  those  presented  in  [3],  [20]  and  [22]: 

The  true  inertial  frame  (/-frame)  -  a  theoretical  reference  frame  in  which  New¬ 
ton’s  laws  of  motion  apply.  The  frame  is  defined  by  a  non-accelerating,  non¬ 
rotating  orthonormal  basis  in  M3.  Because  of  the  relative  nature  of  the  universe, 
the  true  inertial  frame  has  no  predefined  origin  or  orientation. 

The  Earth-centered  inertial  frame  (i-frame)  -  an  orthonormal  basis  in  M3,  with 

its  origin  at  the  center  of  mass  of  the  Earth.  The  x  and  y  axes  are  located 
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on  the  equatorial  plane  with  the  x-axis  pointing  towards  Aries.  The  x-axis 
points  towards  the  North  Pole.  The  i-frame  is  a  non-rotating  frame,  but  it  does 
accelerate  with  respect  to  the  true  inertial  frame  due  to  the  relative  rotation 
between  celestial  bodies.  However,  for  terrestrial  navigation  purposes,  it  can  be 
considered  an  inertial  reference  frame.  The  i-frame  is  illustrated  in  Figure  2.1. 

The  Earth-centered  Earth-fixed  frame  (e-frame)  -  an  orthonormal  basis  in  M3, 
with  its  origin  also  at  the  Earth’s  center  of  mass.  The  e-frame  is  rigidly  attached 
to  the  Earth,  with  the  x-axis  on  the  equatorial  plane  pointing  toward  the  Green¬ 
wich  meridian,  the  x-axis  aligned  with  the  North  Pole,  and  the  y-axis  on  the 
equatorial  plane  pointing  toward  90  degrees  East  longitude.  Because  the  e- 
frarne  is  a  true  Cartesian  reference  frame,  some  navigation  computations  are 
simplified.  The  e-frame  is  illustrated  in  Figure  2.1. 

The  vehicle-fixed  navigation  frame  (n'-frame)  -  an  orthonormal  basis  in  M3, 
with  its  origin  located  at  a  predefined  point  on  a  vehicle  (e.g.,  the  vehicle’s 
center  of  gravity  or  the  center  of  a  triad  of  inertial  sensors,  etc.)  The  vehicle- 
fixed  navigation  frame’s  x,  y,  and  x  axes  point  in  the  North,  East  and  down 
(NED)  directions,  respectively.  Although  the  concept  of  down  is  open  to  inter¬ 
pretation,  for  the  purposes  of  this  research,  down  is  defined  as  the  direction  a 
plumb  line  would  point  due  to  gravity.  The  A-frame  rotates  with  respect  to  the 
e-frame  due  to  translational  motion  of  the  vehicle  and  the  rotation  of  the  earth. 
The  A-frame  is  illustrated  in  Figure  2.1. 

The  Earth-fixed  navigation  frame  (n-frame)  -  an  orthonormal  basis  in  M3,  with 
its  origin  located  at  a  predefined  location  on  the  Earth,  typically  on  the  surface. 
The  Earth-fixed  navigation  frame’s  x,  y,  and  z  axes  point  in  the  North,  East, 
and  down  directions  relative  to  the  origin,  respectively.  As  in  the  previous 
case,  down  is  defined  as  the  direction  of  the  gravity  vector.  In  contrast  to 
the  vehicle-fixed  navigation  frame,  the  Earth-fixed  navigation  frame  remains 
fixed  to  the  surface  of  the  Earth.  While  this  frame  is  not  useful  for  very-long 


Figure  2.1:  The  inertial  and  Earth  frames  originate  at  the  Earth’s  center 

of  mass,  while  the  vehicle-fixed  navigation  frame’s  origin  is  located  at  a  fixed 
location  on  a  vehicle  [22], 


distance  navigation,  it  can  simplify  the  navigation  kinematic  equations  for  local 
navigation  routes.  The  n-frame  is  illustrated  in  Figure  2.2. 

The  body  frame  (6-frame)  -  an  orthonormal  basis  in  M3 ,  rigidly  attached  to  the 
vehicle  with  its  origin  co-located  with  the  navigation  frame.  The  x,  y,  and  z 
axes  point  out  the  nose,  right  wing,  and  bottom  of  an  aircraft,  respectively. 
Strapdown  inertial  sensors  are  fixed  to  the  6-frame,  although  they  may  not  be 
located  at  the  origin  or  aligned  with  the  axes.  The  6-frame  is  shown  in  Figure  2.3. 

The  camera  frame  (c-frame)  -  an  orthonormal  basis  in  M3,  rigidly  attached  to  a 
camera,  with  its  origin  at  the  camera’s  optical  center.  The  x  and  y  axes  point 
np  and  to  the  right,  respectively,  and  are  parallel  to  the  image  plane  of  the 
camera.  The  z-axis  points  out  of  the  camera  perpendicular  to  the  image  plane. 
The  c-frame  is  shown  in  Figure  2.4. 
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Figure  2.2:  The  Earth-fixed  navigation  frame  is  a  Cartesian  reference  frame 
with  the  x  and  y  axes  perpendicular  to  the  gravity  vector,  the  2- axis  aligned 
with  the  gravity  vector,  and  its  origin  fixed  to  the  Earth  [22], 

The  binocular  disparity  frame  (co-frame)  -  an  orthonormal  basis  in  M3 .  which 
is  rigidly  attached  to  the  lever  arm  located  between  cameras  in  a  binocular 
configuration,  with  its  origin  at  a  specified  point  on  the  lever  arm.  The  x, 
y,  and  2  axes  point  up,  right,  and  forward,  respectively,  and  parallel  to  the 
corresponding  ca- frame  axes.  The  Co-frame  is  shown  in  Figure  2.5. 

Image  frame  (pzx-frame)  -  an  orthonormal  basis  in  M2  with  its  origin  beyond  the 
upper-left  pixel  of  a  digital  image.  Multiple  conventions  exist  throughout  im¬ 
age  processing  literature  for  pixel  indexing  and  axis  orientation.  In  this  thesis, 
images  are  indexed  according  to  the  matrix  storage  format  used  by  The  Math- 
works,  Inc’s  Matlab  software,  with  the  upper  left  pixel  indexed  as  (1,1),  the 
x-axis  down  the  left  side  and  y-axis  across  the  top  of  the  image. 
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Figure  2.3:  Aircraft  body  frame  illustration.  The  aircraft  body  frame  origi¬ 
nates  at  the  aircraft  center  of  gravity  [22], 

Calibration  board  frame  (ca/-frame)  -  an  orthonormal  basis  in  M3  with  its  origin 
at  the  upper-left  corner  of  the  calibration  board.  The  x  and  y  axes  are  aligned 
with  the  calibration  board  while  the  2- axis  points  upward,  away  from  the  board. 
The  cod-frame  is  shown  in  Figure  2.6. 

2.2.1  Coordinate  Transformations.  It  is  often  necessary  to  relate  measured 
vector  quantities  from  one  frame  to  another.  This  is  accomplished  by  applying  a 
coordinate  transformation  matrix,  which  can  be  composed  of  a  translation  and/or  a 
rotation  component.  Translations  describe  relative  positions  between  the  origins  of 
two  reference  frames,  while  rotations  describe  the  relative  angle  between  the  principal 
axes  of  two  reference  frames. 

2.2. 1.1  Translation  Vectors.  When  two  reference  frames  differ  only 
in  relative  origin  position  (e.g.,  their  principal  axes  are  co-directional),  a  translation 


Figure  2.4:  Camera  frame  illustration.  The  camera  reference  frame  origi¬ 

nates  at  the  optical  center  of  the  lens  [22], 

vector  can  be  used  to  express  vectors  from  a  coordinate  frame  in  terms  of  a  second 
frame.  The  description  of  point  P  in  terms  of  frame  a  can  be  expressed  in  terms  of  a 
co-directional  frame  b  using 

Pb  =  Pa~Paab  (2-1) 

where  is  the  position  of  point  P  in  frame  b  and  the  vector  p“b  is  the  translation  of  the 
a-frame  to  the  6-frame  in  a- frame  coordinates.  Figure  2.7  illustrates  the  application 
of  a  translation  vector  between  two  reference  frames. 

2.2. 1.2  Direction  Cosine  Matrices.  While  a  translation  vector  defines 
the  relative  origin  position  between  two  reference  frames,  a  Direction  Cosine  Matrix 
(DCM)  defines  the  relative  rotation  between  the  principal  axes  of  two  reference  frames. 
DCMs  apply  rotations  to  each  axis  in  a  reference  frame,  using  the  standard  Euler 
angles,  and  in  the  following  order  [16]: 

1.  First,  a  rotation  angle  ^  is  applied  about  the  z-axis  of  the  originating  frame. 


12 


Figure  2.5:  Binocular  disparity  frame  illustration.  The  binocular  disparity 
frame  originates  at  the  midpoint  between  the  optical  center  of  the  two  camera 
frames,  ca  and  q,  [22], 


2.  Next,  a  rotation  angle  9  is  applied  about  the  y-axis  of  the  newly  formed  inter¬ 
mediate  reference  frame. 

3.  Finally,  a  rotation  angle  0  is  applied  about  the  x-axis  of  the  second  intermediate 
reference  frame  formed  in  step  2. 

Once  the  desired  Euler  angles  are  defined,  a  DCM  can  be  built  using 
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(2.2) 


where  Cba  is  the  DCM  that  rotates  the  principal  axes  of  frame  a  to  become  co-linear 
with  the  principal  axes  in  frame  b.  A  vector,  originally  written  in  terms  of  frame  a, 
ya  can  be  represented  in  terms  of  frame  b  using 

y6  =  Cbaya  (2.3) 

Additionally,  due  to  their  purely  rotational  function,  DCMs  have  the  following 
properties: 
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Figure  2.6:  Calibration  board  frame  illustration.  The  calibration  board 

frame  originates  at  the  upper-left  corner  of  the  calibration  board  [1]. 

•  The  determinant  of  any  DCM  is  always  equal  to  1,  ensuring  vector  magnitudes 
are  always  preserved  during  rotations. 

•  A  DCM  is  guaranteed  to  have  an  inverse  since  it  has  a  nonzero  determinant. 

•  A  DCM  that  aligns  frame  b  to  frame  a  is  simply  the  inverse  of  the  DCM  that 
aligns  frame  a  to  frame  b. 

•  The  inverse  of  a  DCM  equals  its  transpose. 

2.2. 1.3  Transformation  Matrices.  Transformation  matrices  provide  a 
practical  method  to  apply  both  a  translation  and  a  rotation  to  a  vector  or  point  using 
one  mathematical  operation.  In  order  to  use  transformation  matrices,  a  homogeneous 
vector  is  constructed  by  augmenting  the  original  vector  with  an  additional  element 
set  equal  to  one,  i.e. , 

P  =  \Px  Py  Pz  |  1]T  (2.4) 
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Figure  2.7:  Translation  vectors.  The  vector  p ab  translates  the  origin  of  frame 
a  to  the  origin  of  frame  b  and  can  be  used  to  describe  the  position  of  point  P 
relative  to  frame  b  given  its  coordinates  in  terms  of  frame  a. 

A  transformation  matrix  is  then  composed  of  a  rotation  DCM  and  a  translation 
vector.  For  example,  given  a  DCM  C^,  which  aligns  frame  a  to  frame  b,  and  a 
translation  vector  p“6,  which  collocates  the  origin  of  frame  a  with  the  origin  of  frame 
b,  the  a-frame  vector  p“  can  be  expressed  in  terms  of  frame  b  using 


(2.5) 


where 


(2.6) 


0lx3  |  1 


2.3  Inertial  Navigation 

Inertial  navigation  is  based  on  the  basic  concept  that,  starting  from  a  known 
location,  attitude,  and  velocity,  a  vehicle’s  current  position  and  attitude  can  be  esti¬ 
mated  by  integrating  measured  changes  in  velocity  and  rotation.  Inertial  navigation 
measurement  devices  such  as  an  Inertial  Measurement  Unit  (IMU)  consist  of  three 
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accelerometers,  which  measure  specific  force  and  three  gyroscopes  (commonly  referred 
to  as  gyros),  which  measure  rotational  velocity  relative  to  the  /-frame. 

When  equipped  with  the  navigation  equations  necessary  to  integrate  for  posi¬ 
tion,  velocity  and  attitude,  the  entire  measurement  system  is  referred  to  as  an  Inertial 
Navigation  System  (INS).  In  general,  there  are  two  types  of  INSs:  platform  and  strap- 
down.  A  platform  INS  is  composed  of  measurement  sensors  mounted  on  a  platform 
and  gimbaled  such  that  the  vertical  sensor  of  the  unit  is  always  aligned  with  local 
gravity.  In  contrast,  a  strapdown  INS  consists  of  a  a  simple  IMU  rigidly  mounted  to 
a  vehicle  with  its  motion  sensors  mounted  orthogonally  and  aligned  with  the  vehicle’s 
/-frame.  During  the  experiments  conducted  for  this  thesis,  vehicles  were  equipped 
with  Micro  Eletro-mechanical  Systems  (MEMS)  grade  strapdown  IMUs  due  to  their 
small  size,  light  weight  and  low-power  consumption.  The  navigation  equations  were 
then  implemented  in  software  separate  from  the  device.  Further  information  regarding 
strapdown  inertial  navigation  can  be  found  in  the  works  of  Titterton  and  Weston  [20]. 

2-4  Digital  Image  Processing 

This  section  explores  the  basic  concepts  behind  digital  imaging  and  signal  pro¬ 
cessing  on  which  camera  calibration,  landmark  tracking  and  image-aided  navigation 
are  built. 

2-4-1  Digital  Imaging.  In  this  research,  the  digital  imaging  model  consists  of 
a  light  source,  a  subject  or  scene,  an  optical  collection  device  and  an  imaging  sensor. 
The  light  source  illuminates  the  subject  while  the  reflected  light  rays  are  collected  by 
a  lens  and  captured  by  an  optical  image  sensor.  The  image  sensor  translates  image 
intensities  into  voltages,  which  are  then  quantized  into  discrete  Red  Green  Blue  (RGB) 
values  by  an  analog-to-digital  converter  as  shown  in  Figure  2.8. 

2. 4-1-1  Perspective  Projection  Geometry.  Perspective  projection  ge¬ 
ometry  describes  the  mathematical  relationship  between  the  3-D  coordinates  of  a 
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Figure  2.8:  Image  sensor  diagram.  Light  captured  by  the  lens  is  interpreted 
by  the  imaging  sensor  to  produce  a  digital  image  [16]. 

point  in  the  c-frame  and  its  projection  onto  the  2-D  pix- frame.  The  physical  proper¬ 
ties  of  a  scene  can  then  be  extracted  from  captured  images  using  such  relationships. 
The  process  by  which  a  3-D  point  is  projected  onto  the  2-D  image  plane  is  referred 
to  as  perspective  projection.  The  perspective  projection  model  is  derived  from  the 
basic  pinhole  camera  shown  in  Figure  2.8.  In  the  standard  pinhole  camera  model, 
light  rays  travel  from  the  subject,  pass  through  the  lens  and  are  projected  onto  the 
focal  plane  located  at  a  distance  /  behind  the  lens.  However,  because  light  rays  travel 
in  straight  lines  through  the  lens  and  towards  the  origin  of  the  imaging  sensor,  the 
received  image  is  naturally  inverted.  In  order  to  account  for  and  undo  this  natural 
image  inversion,  a  virtual  image  plane  is  placed  at  a  distance  /  in  front  of  the  lens,  on 
which  geometric  proportions  are  identical  to  those  perceived  by  the  imaging  sensor. 
The  modified  pinhole  camera  model  is  illustrated  in  Figure  2.9.  The  vectors  depicted 
in  Figure  2.9  will  be  discussed  in  the  next  section. 
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Figure  2.9:  Pinhole  camera  model.  The  effects  of  image  rotation  through 

the  lens  are  modeled  by  placing  a  virtual  image  plane  in  front  of  the  sensor  [16] . 


24.1.2  Intrinsic  Camera  Matrix.  Using  the  properties  of  perspective 
projection  geometry,  the  mathematical  expression  that  transforms  a  3-D  point  in  the 
c-frame  into  a  2-D  point  in  the  pix- frame  can  be  developed.  The  vector  sc  originates 
at  the  camera  sensor  origin  Oc  and  terminates  at  S.  As  shown  in  Figure  2.9,  the  vector 
Sproji  which  also  originates  at  Oc  but  terminates  at  the  image  plane,  is  collincar  with 
and  a  scalar  multiple  of  sc.  Since  the  image  plane  is  located  at  z  —  f,  the  ^-coordinate 
of  Spro  ■  is  also  /.  Using  similar  triangle  geometry,  the  scaling  factor  which  relates  the 
two  vectors  can  be  derived  as  shown  in  Equation  (2.7) 
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proj 


(2.7) 


Next,  the  ^-coordinate  in  Sproj-  is  discarded  as  the  point  moves  from  the  c-frame 
to  the  pix-frame.  The  ^-coordinate  removal  can  be  represented  mathematically  using 
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Figure  2.10:  Image  plane  diagram.  The  image  plane  has  both  physical 

dimensions  H  x  W  and  pixel  dimensions  M  x  N .  The  x  and  y  axes  from  the 
c- frame  project  onto  the  image  plane  at  the  coordinates  [16]. 


where  sproj  is  the  2-D  projection  of  sproj  onto  the  pix- frame.  Finally,  the  translation 
and  scaling  between  the  c-frame  and  the  pix-hame  is  shown  in  Figure  2.10.  The 
relative  rotation  between  the  two  frames  involves  simply  inverting  the  x-axis  and  is 
done  using 


pPix  = 
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p  proj 


(2.9) 


where  the  vectors  pptx  and  ppr0J  describe  the  position  of  a  point  p  in  the  pix-frame 
and  projected  c-frame  respectively.  The  projected  c-frame  is  depicted  in  Figure  2.10 
by  the  xproj  and  yproj  axes  and  is  obtained  by  projecting  the  3-D  c-frame  onto  the 
2-D  image  plane. 
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The  scaling  factors  are  derived  from  the  ratios  of  physical-to-pixel  dimensions 
in  the  image  plane.  The  x-axis  is  scaled  from  a  physical  length  H  to  a  pixel  length 
M  which  leads  to  a  scaling  factor  of  M/H  in  the  x-direction.  Similarly,  the  y- axis  is 
scaled  using  the  factor  N/W.  Finally,  the  origin  of  the  c-frame  is  offset  by  pixels 
in  the  x-direction  and  pixels  in  the  ^/-direction.  Combining  all  transformations 
between  the  c-frame  and  pix-frame  up  to  this  point  yields 
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where  the  vector  spzx  is  the  2-D  projection  of  a  point  in  the  c-frame,  described  by 
Sproj,  onto  the  pix- frame.  Finally,  homogeneous  vector  representations  can  be  used  to 
describe  the  combined  rotation,  translation,  and  scaling  through  the  transformation 
matrix 

rr\pix 


where 

spvx  =  -Tpixsc  (2.12) 

The  camera  model  can  also  be  used  to  convert  coordinates  from  the  pix-frame 
back  to  the  c-frame.  A  set  of  coordinates  expressed  in  terms  of  the  pix-frame  can  be 
converted  back  to  c-frame  using 
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where 


sc  =  T^s**  (2.14) 

Note  that  since  there  is  a  loss  of  dimension  when  transforming  from  the  c-frame 
to  the  pix- frame,  it  is  impossible  to  fully  invert  the  transform.  Rather,  Equation  (2.14) 
yields  a  homogeneous  3-D  vector  that  is  co-directional  with  the  true  vector  but  not 
necessarily  equal.  The  stereo-vision  techniques  designed  to  recover  the  lost  dimension 
are  discussed  in  Section  2.5.3. 

2-4-2  Scale  Invariant  Feature  Transform.  Feature  identification  and  match¬ 
ing  are  integral  processes  within  the  proposed  automatic  calibration  and  affine  distor¬ 
tion  prediction  algorithms.  Although  many  feature  detection  and  matching  techniques 
are  found  throughout  computer  vision  literature,  only  a  few  generate  features  that  can 
be  recognized  through  changes  in  camera  position  and  orientation  with  respect  to  the 
scene,  a  property  which  is  of  fundamental  importance.  That  is,  the  feature  descrip¬ 
tion  for  each  feature  should  be  invariant  to  changes  in  scale,  rotation  or  translation 
within  the  image-space.  Because  of  these  requirements,  the  Scale  Invariant  Feature 
Transform  (SIFT)  algorithm  developed  by  Lowe  [12]  was  chosen  as  the  feature  de¬ 
tection  algorithm  used  in  this  research.  The  SIFT  algorithm  is  composed  of  four 
main  stages:  scale-space  extrema  detection,  keypoint  localization,  orientation  assign¬ 
ment  and  keypoint  description.  This  section  explores  how  these  four  stages  produce 
a  feature  detection  algorithm  that  is  invariant  to  scale,  rotation,  and  translation. 

2.4-2. 1  Scale-Space  Extrema  Detection.  In  the  first  stage  of  the  trans¬ 
form,  stable  features  are  identified  within  the  image  across  all  possible  scales  using  a 
continuous  scale  function  known  as  scale-space.  The  scale-space  function  L(x,y,  cr)  is 
built  using  the  Gaussian  distribution  as  a  base  function  kernel  and  is  given  by 

L(x,y,a)  —  G(x,y,a)  ®I(x,y)  (2.15) 
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where  I(x,y )  is  an  input  image,  ©  is  the  convolution  operator  and  G(x,y ,  a)  is  given 
by 

G(x,y,a)  =  1^ie-^)l^  (2.16) 

A  Difference  of  Gaussians  (DOG)  function  D(x,y,a )  is  constructed  in  order  to 
provide  scale  invariant  detection  and  is  given  by 

D(x,  y,  a)  =  (G(x,  y,  ka)  -  G(x,  y,  cr))  *  I(x,  y)  (2.17) 

=  L(x,y,ka)  -  L(x,y,a)  (2.18) 

which  yields  a  computationally  efficient  approximation  to  the  scale-normalized  Lapla- 
cian  of  Gaussian  function  [11],  thereby  providing  scale  invariance.  Figure  2.11  illus¬ 
trates  how  the  DOG  functions  are  constructed  for  an  input  image.  The  initial  image  is 
incrementally  convolved  with  2-D  Gaussian  distributions  to  produce  blurred  images, 
which  are  separated  by  a  constant  k  in  scale-space  (shown  in  the  left  column).  Next, 
adjacent  blurred  images  in  scale-space  are  subtracted  per  Equation  (2.18)  to  produce 
the  DOG  images  (shown  on  the  right  column).  Groups  of  blurred  images  are  referred 
to  as  octaves  and  are  separated  by  factors  of  2  in  scale-space.  After  an  entire  octave 
is  completed,  the  next  octave  is  formed  by  downsampling  the  k  =  2  blurred  image  by 
a  factor  of  2  and  repeating  the  process. 

After  forming  all  possible  DOG  functions,  the  local  extrema  within  all  the  re¬ 
sulting  images  needs  to  be  found.  To  do  so,  each  sample  point  is  compared  to  its 
eight  nearest  neighbors  in  the  current  image  as  well  as  the  nine  nearest  neighbors  in 
the  scale  image  above  and  below  as  shown  in  Figure  2.12.  Finally,  a  sample  point  is 
considered  a  possible  feature  only  if  it  is  larger  or  smaller  than  all  of  of  its  neighbors. 

2. 4 -2. 2  Keypoint  Localization.  Once  a  candidate  feature  has  been 
found  using  the  scale-space  extrema  process,  a  detailed  fit  to  the  nearby  data  in 
the  image  must  be  performed  to  determine  its  location,  scale,  and  ratio  of  principal 
curvatures.  The  data  fit  allows  the  rejection  of  candidate  features  that  have  low 
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Figure  2.11:  Difference  of  Gaussians  illustration.  Neighboring  scale-space 

images  are  subtracted  to  produce  the  Difference  of  Gaussians  function  [12]. 

contrast,  which  makes  them  susceptible  to  noise,  or  are  poorly  localized  along  an 
edge.  The  data  fit  is  found  through  a  Taylor  series  expansion  (up  to  the  quadratic 
terms)  of  the  DOG  function  D(x,y,a )  using 

,  .  dDT  1  rd2D 

B(x)  =  D  +  arx+2x  a^x  (2'19) 

where  D  is  evaluated  at  the  candidate  feature  location  and  x  =  (x,y,a)J  is  the 
offset  from  the  candidate  point.  The  location  of  the  extremum  x  within  the  image  is 
then  estimated  using  Equation  (2.19)  by  taking  its  derivative  with  respect  to  x  and 
setting  it  equal  to  zero.  If  the  resulting  offset  between  the  estimate  of  the  extremum’s 
location  x  and  the  current  candidate  is  larger  than  0.5  in  any  dimension,  a  different 
candidate  is  chosen  and  the  process  is  repeated.  Finally,  since  even  poorly  defined 
candidates  will  have  a  high  DOG  response  along  the  edges  of  the  image,  the  ratio  of 
their  principal  axes  is  analyzed  in  order  to  eliminate  any  candidates  that  have  a  high 
ratio  of  largest-to-smallest  eigenvalues,  indicating  they  are  on  an  edge. 
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Figure  2.12:  Local  extrema  illustration.  A  candidate  feature  must  be  smaller 
than  or  greater  than  all  of  its  26  neighbors  in  scale-space  [12]. 

2. 4- 2. 3  Orientation  Assignment.  Once  all  the  qualifying  features  have 
been  selected  from  the  elimination  processes  described  above,  a  relative  orientation  is 
assigned  to  each  feature  based  on  local  image  properties.  Computing  the  orientation 
of  each  feature  allows  the  generation  of  an  unique  descriptor,  which  identifies  the 
feature  in  its  own  orientation  frame.  Therefore,  features  can  later  be  matched  using 
their  descriptors  regardless  of  the  absolute  orientation  of  their  source  images,  making 
the  SIFT  algorithm  rotation  invariant.  A  feature’s  orientation  is  computed  from 
the  blurred  image  L  which  corresponds  to  the  feature’s  scale  a.  The  local  gradient 
magnitude  and  orientation  are  computed  around  the  feature  location  in  L  using 


m(x,y )  =  V  A2  +  B2 
9(x,y)  =  tan  ^(B/A) 


(2.20) 

(2.21) 
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where 


A  =  L(x  +  l,y)-L(x-l,y)  (2.22) 

B  =  L(x,y  +  1)  -  L(x,y  -  1)  (2.23) 

Finally,  a  histogram  of  the  orientations  is  built  from  sample  points  around  the 
feature  location.  Local  peaks  are  then  identified  within  the  histogram.  Multiple 
keypoints,  consisting  of  location,  scale,  and  orientation,  can  be  generated  from  a 
single  feature,  consisting  only  of  location  and  scale,  by  assigning  different  orientations. 
Usually,  the  orientations  in  the  top  20%  of  the  histogram  are  used  in  generating 
multiple  keypoints  from  a  particular  feature. 

2. 4-2. 4  Keypoint  Description.  Up  to  this  point,  keypoints  consisting  of 
location,  scale,  and  orientation  have  been  generated  from  an  input  image.  These  three 
parameters  are  then  used  to  generate  local  3-D  coordinate  frames  for  each  keypoint, 
providing  a  repeatable  method  of  keypoint  description,  which  is  invariant  to  such 
parameters.  The  final  step  in  the  SIFT  algorithm  generates  a  more  unique  keypoint 
descriptor  that  provides  partial  invariance  to  factors  such  as  illumination  and  small 
changes  in  3-D  viewpoint.  The  SIFT  keypoint  descriptor  process  is  based  on  a  model 
of  biological  vision  where  neurons  respond  to  gradients  at  particular  orientations  and 
spatial  frequencies.  Using  this  model,  a  keypoint  descriptor  is  created  by  computing 
the  gradient  magnitude  and  orientation  in  a  sampled  region  around  the  keypoint’s 
location.  Next,  the  orientations  are  weighed  by  a  2-D  Gaussian  function,  centered  at 
the  keypoint’s  location,  and  accumulated  into  8-bin  histograms,  which  summarize  the 
contents  over  4x4  subregions  as  shown  in  Figure  2.13.  Consequently,  the  keypoint 
descriptor  isa8x4x4  =  128  point  normalized  vector  containing  the  values  of  the 
gradient  orientation  histogram  bins  in  the  4x4  subregions. 

2-4-3  Feature  Matching  Techniques.  Szeliski  [19]  introduces  a  few  feature 
matching  methods  for  algorithms  that  produce  feature  or  keypoint  descriptors  such 
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Figure  2.13:  Keypoint  descriptor  illustration.  A  feature’s  unique  descriptor 
is  composed  from  a  histogram  of  the  image  gradients  around  its  location  [12]. 

as  SIFT.  In  this  thesis,  the  particular  feature  matching  algorithm  used  for  both  auto¬ 
matic  calibration  and  image-aided  navigation  is  the  Nearest  Neighbor  Distance  Ratio 
(NNDR).  NNDR  was  chosen  over  other  common  techniques  such  as  Random  Sample 
Consensus  (RANSAC)  [4]  due  to  the  deep-coupling  of  inertial  and  imaging  sensors 
provided  by  the  image-aided  navigation  algorithm  discussed  in  Section  2.7.  Feature 
matching  is  accomplished  by  comparing  feature  descriptors,  which  are  normalized 
vectors  in  128-space  as  discussed  in  Section  2. 4. 2. 4.  Using  NNDR,  potential  matches 
are  found  by  computing  the  Euclidean  distance  in  128-space  from  a  reference  feature 
to  all  possible  candidate  features.  Then,  the  following  ratio  is  computed  using  the 
two  closest  candidate  features 


r  = 


di 

d2 


(2.24) 


where  is  the  reference  feature  descriptor  and  d^  and  d#  are  the  two  closest  can¬ 
didate  feature  descriptors.  The  NNDR  computed  in  Equation  (2.24)  is  a  measure  of 
the  strength  of  a  particular  match.  If  the  ratio  is  much  less  than  1,  then  is  much 
closer  to  d/j  than  d#  and  therefore  the  match  is  strong.  In  contrast,  if  the  ratio  is 
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close  to  1,  then  both  and  d#  are  very  close  to  d/j  and  therefore  the  match  may  be 
too  weak  to  declare  it  with  confidence.  Setting  a  maximum  NNDR  ratio  that  must 
be  satisfied  helps  prevent  false  matches. 

2-4-4  Affine  Keypoint  Distortions.  The  keypoint  descriptor  generation  and 
matching  algorithms  discussed  up  to  this  point  provide  full  scale  and  rotation  invari¬ 
ance,  as  well  as  partial  illumination  invariance.  However,  another  major  necessity  in 
feature  tracking  is  affine  distortion  invariance.  As  the  viewpoint  of  a  physical  object 
changes,  either  through  vehicle  motion  or  camera  angle,  the  image  gradients  from 
which  keypoint  descriptors  are  derived  change  along  with  the  image  of  the  object. 
Fortunately,  the  change  can  be  adequately  modeled  using  affine  transformations  on 
the  initial  image.  Using  this  concept  as  a  foundation,  Morel  and  Yu  developed  the 
Affine  Scale  Invariant  Feature  Transform  (ASIFT)  algorithm  [15],  which  they  claim 
provides  full  affine  invariance.  The  ASIFT  algorithm  recursively  uses  the  basic  SIFT 
algorithm  described  in  the  previous  section  as  a  core  function.  Much  like  the  scale- 
space  development  used  by  Lowe  to  provide  scale  invariance,  Morel  and  Yu  simulate  a 
series  of  affine  transformations  on  a  given  image  and  calculate  standard  SIFT  keypoint 
descriptors  for  each  of  the  simulated  images.  When  compared  to  SIFT,  the  ASIFT 
algorithm  outputs  nearly  nine  times  as  many  keypoints  per  input  image  because  of 
the  affine  distortion  simulations.  The  ASIFT  algorithm  creates  multiple  versions  of 
the  standard  SIFT  descriptors  for  each  image,  which  increases  the  probability  of  ob¬ 
taining  successful  matches.  The  keypoint  matching  improvement  given  by  ASIFT  is 
illustrated  in  Figures  2.14  and  2.15.  The  predictive  affine  distortion  modeling  algo¬ 
rithm  introduced  in  Chapter  III  is  partly  based  on  the  ASIFT  concept.  However,  as 
later  explained  in  Section  2.6,  it  only  needs  to  generate  a  single  affine-transformed 
image,  since  distortion  parameters  are  estimated  in  the  Extended  Kalman  Filter. 
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Figure  2.14:  SIFT  matches  during  high  affine  distortion.  Since  the  SIFT 

algorithm  does  not  account  for  affine  distortions,  0  successful  matches  and  6 
false  matches  are  obtained  with  an  NNDR  of  0.55. 


Figure  2.15:  ASIFT  matches  during  high  affine  distortion.  Because  the 

ASIFT  algorithm  provides  full  affine  invariance,  100  successful  matches  and  0 
false  matches  are  obtained  with  an  NNDR  of  0.55. 

2. 5  Camera  Calibration 

Real  world  imaging  sensors  present  nonlinear  lens  distortions  that  must  be  mod¬ 
eled  and  accounted  for  in  computer  vision  algorithms  such  as  image-aided  navigation. 
Additionally,  the  relative  rotation  and  translation  between  a  camera  and  the  scene  it 
captures  (referred  to  as  extrinsic  parameters)  must  be  determined  for  proper  image 
analysis.  This  section  explores  the  camera  calibration  techniques  that  are  currently 
used  to  compensate  for  the  distortion  effects  and  calculate  the  extrinsic  configuration 
parameters  in  a  computer  vision  system. 
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2.5.1  Distortion  Models.  The  first  step  in  determining  the  nonlinear  lens 
distortion  of  an  imaging  sensor  is  to  model  the  distortion.  Brown  [2]  groups  the 
distortion  parameters  into  radial,  tangential,  and  skew  components. 

2.5. 1.1  Radial  Distortion.  Radial  distortion,  the  most  noticeable  of 
the  three,  causes  straight  lines  to  appear  curved  in  the  images  produced  by  the  sensor. 
The  main  cause  of  radial  distortion  is  non-uniform  magnification  inside  the  sensor’s 
optics.  Radial  distortion  is  modeled  as  a  function  of  r,  the  Euclidean  distance  between 
a  point  given  by  the  vector  sproj  in  the  c-frame  and  the  frame’s  origin,  which  is  given 
by 


drad  =  x  +  ^2  +  £^4  +  ^6  (2 .2 5) 

r  =  sJ(spxroj)2  +  ( spyroJ )2  (2.26) 


where  drad  is  the  radial  distortion  factor  for  a  particular  length  r,  and  k\ ,  k2,  and  k% 
are  constant  distortion  coefficients  calculated  through  camera  calibration. 


2.5. 1.2  Tangential  Distortion.  Tangential  distortion  is  less  visible  on 
images  produced  by  the  sensor  and  causes  the  principal  point  c  to  shift  away  from 
the  true  (geometric)  center  of  the  image  plane.  The  causes  of  tangential  distortion 
include  differing  curvatures  in  the  front  and  back  of  the  sensor’s  lens  and  misalignment 
between  the  sensor’s  lens  and  collection  array.  Brown  models  tangential  distortion  as 
a  2-D  function  of  r,  sproj  and  sproj  defined  by 


(  2Pl  (s™ )  (s™ )  +  p2  [r2  +  2 (spri * ) 2]  \ 

\  Pi[r2  +  2(s^')2  +  2 P2(srj)(sproj)\  ) 


2.5. 1.3  Skew  Factor.  The  distortion  caused  by  a  skew  factor  ac  is  the 
most  difficult  to  visualize  and  refers  to  the  orthogonality  between  the  x  and  y  axes 
in  the  //ix-frame.  Although  the  skew  factor  for  most  real  world  imaging  systems  is 
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nearly  zero,  it  is  not  negligible  when  extracting  real  world  metrics  from  images  of 
a  scene.  The  skew  factor  determines  how  far  the  pix-frame  axes  are  from  perfectly 
orthogonal.  When  included  in  a  camera  model,  the  skew  factor  multiplies  the  upper 
middle  coefficient  in  the  camera  transformation  matrix  such  that 

rM  _  rM  M+l 
J  H  acJ  H  2 

0  f%  s'  (2.28) 

0  0  1 

2.5. 1-4  Camera  Distortion  Model.  Combining  all  three  intrinsic  distor¬ 
tion  parameters  generates  a  complete  mathematical  representation  of  the  projection 
of  a  point  given  by  the  vector  sc  in  the  c-frame  onto  the  distorted  pix-frame,  which 
yields  the  true  pixel  coordinates  spix.  The  complete  camera  distortion  model  is  given 
by 

-/#  -ajf  ^  drad  0  dlan 

0  fw  ^r1  0  drad  dlan  sC  (2-29) 

0  0  1  0  0  1 

2.5.2  Calibration  Algorithms.  Having  defined  a  camera  distortion  model, 
an  adequate  method  for  calculating  its  parameters  can  be  developed.  Many  camera 
calibration  techniques  are  found  throughout  computer  vision  literature.  While  most 
require  the  extensive  use  of  complicated  calibration  equipment,  Dr.  Zhang  introduces 
a  practical  algorithm  that  requires  only  a  planar  surface  with  known  feature  coordi¬ 
nates  [24] .  Furthermore,  Zhang’s  algorithm  contains  the  fundamental  techniques  used 
by  Bouguet  in  his  camera  calibration  toolbox  for  Matlab®  [1]. 

2. 5. 2.1  Camera  Calibration  from  a  Planar  Surface.  Zhang  [24]  de¬ 
velops  a  flexible  and  robust  algorithm  for  calculating  the  intrinsic  and  extrinsic  pa¬ 
rameters  for  a  given  imaging  sensor  (or  set  of  sensors).  Using  the  distortion  model 
developed  in  the  previous  section,  Zhang  defines  the  relationship  between  a  3-D  point 
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and  its  2-D  image  projection  in  a  fashion  similar  to  Equation  (2.29)  such  that 


an 

A 


A[R  t]M 

(2.30) 

f «  c  u0 

o 

o 

(2.31) 

0  0  1 


where  s  is  an  arbitrary  scale  factor,  the  augmented  matrix  [R  t]  contains  the  ex¬ 
trinsic  parameters,  and  the  matrix  A  contains  the  intrinsic  parameters.  Comparing 
Equation  (2.28)  with  Equation  (2.31)  yields 

1  olc  0 

A  =  Tpcix  0  1  0  (2.32) 

0  0  1 


and 


fM 

a=~fH 

(2.33) 

M 

C  =  -ac/  — 

(2.34) 

M+l  . 

u0=  2 

(2.35) 

(2.36) 

N+l  . 

Vo=  2 

(2.37) 
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Additionally,  since  the  extrinsic  parameters  describe  the  relative  rotation  and 
translation  between  the  3-D  c-frame  and  the  2-D  pix- frame,  the  augmented  matrix 
[R  t]  is  of  the  form 


u  i 

02 

0.3 

T  21 

T  22 

f  23 

ty 

r  n 

T 12 

f  13 

tz 

0 

0 

0 

1 

(2.38) 


Since  the  3-D  points  to  be  used  for  calibration  all  lie  on  a  planar  surface,  Zhang 
begins  by  defining  extrinsic  parameters  relative  to  the  caZ-frame  such  that  all  3-D 
points  lie  on  z  —  0.  Since  all  points  lie  on  the  same  plane,  Zhang  drops  the  third 
coordinate  and  reduces  all  3-D  point  coordinates  to  2-D,  yielding 


sm  =  HM  (2.39) 

H  =  A[ri  r2  t]  (2.40) 


where  H  is  a  3  x  3  homography  matrix  defined  up  to  a  scale  factor.  After  estimating 
H  from  a  single  set  of  model  (M)  and  image  (m)  points,  Zhang  uses  the  fact  that  the 
column  vectors  ri  and  r2,  which  compose  H.  are  orthonormal  to  define  the  following 
additional  constraints  on  the  intrinsic  parameters 

hf  A“rA-1h2  =  0  (2.41) 

h[  A-TA"1^  =  h2A~TA_1h2  (2.42) 


where  the  notation  —  T  represents  the  inverse  and  transpose  operations  and  the  col¬ 
umn  vector  h*  is  the  ith  column  of  H.  Using  the  two  constraints  given  above,  Zhang 
then  solves  for  the  extrinsic  and  intrinsic  parameters  by  observing  the  same  planar 
surface  from  at  least  two  views,  which  must  differ  by  both  rotation  and  translation. 
Observing  more  than  two  views  of  the  planar  surface  creates  an  overdetermined  system 
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that  can  be  solved  using  least-squares.  Creating  an  overdetermined  system  accounts 
for  noise  in  the  collection  process.  Further  information  on  the  closed  form  (analytical) 
and  iterative  (maximum  likelihood)  solutions  to  the  calibration  system  can  be  found 
in  Zhang’s  technical  report  [24], 

2. 5. 2. 2  Caltech  Camera  Calibration  Toolbox.  Bouguet  [1]  makes  exten¬ 
sive  use  of  Dr.  Zhang’s  calibration  method  to  develop  a  practical  camera  calibration 
user  interface  for  Mat  lab®.  Bouguet  uses  the  exact  homography  estimation  tech¬ 
nique  described  by  Zhang  but  chooses  to  use  the  orthogonality  property  of  vanishing 
points  [16]  to  estimate  the  intrinsic  parameters.  Finally,  Bouguet  implements  ad¬ 
ditional  algorithms  [7]  to  estimate  the  tangential  distortion  coefficients.  Although 
the  Camera  Calibration  Toolbox  developed  by  Bouguet  has  become  one  of  the  most 
widely  used  tools  in  camera  calibration  among  computer  vision  research  facilities,  its 
mode  of  operation  is  unnecessarily  manual  and  prone  to  user  error.  One  of  the  goals 
of  this  research,  which  will  be  discussed  in  the  next  chapter,  is  to  automate  Bouguet ’s 
toolbox. 

Using  a  common  reference,  the  extrinsic  results  for  each  camera  in  a  system  are 
compared  in  order  to  compute  their  relative  rotation  and  translation.  To  do  so,  the 
cameras  and  the  calibration  board  must  remain  stationary  (relative  to  one  another) 
while  capturing  each  view  from  multiple  cameras.  Another  goal  of  this  research  is 
to  develop  a  robust  multi-camera  calibration  algorithm  that  can  be  used  to  calibrate 
multi-camera  systems  with  non-overlapping  fields  of  view.  Further  development  is 
presented  in  the  next  chapter. 

2.5.3  Binocular  Stereopsis.  Using  the  epipolar  geometry  illustrated  in  Fig¬ 
ure  2.16,  the  3-D  location  of  a  point  can  be  calculated  by  observing  it  from  two 
different  cameras  with  known  rotation  and  translation  (obtained  during  calibration). 
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Figure  2.16:  Binocular  imaging  geometry.  The  3-D  position  of  a  point  in 

the  n-frame  can  be  extracted  from  its  projections  onto  the  image  planes  of  two 
cameras  with  known  relative  rotation  and  translation. 

The  epipolar  geometry  constraints  are  given  by 


Pr  =  R(P'  -  T)  (2.43) 

p'  =  | Pl  (2.44) 

Zi 

pr  =  | -Pr  (2.45) 

where  T  and  R  represent  the  relative  rotation  and  translation  between  the  left  and 
right  cameras,  fi  and  fr  are  the  respective  focal  lengths  of  each  camera,  the  vectors 
pr  and  p;  represent  the  projections  of  the  world  point  P  onto  each  image  plane,  and 
the  vectors  Pr  and  P1  represent  the  coordinates  of  the  world  point  P  on  each  c-frame. 
Starting  with  the  left  image  of  a  point,  a  1-D  search  along  the  epipolar  line  on  the  right 
image  can  be  conducted  in  order  to  extract  the  distance  to  the  point  thus  establishing 
its  full  3-D  location.  Binocular  imaging  geometry  is  used  extensively  in  determining 
the  3-D  location  of  each  landmark  within  the  image-aided  Extended  Kalman  Filter 
presented  in  Section  2.7.  Additional  information  on  epipolar  geometry  and  binocular 
stereopsis  can  be  found  in  Szeliski’s  textbook  [19]. 
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2.6  Kalman  Filtering 

The  Kalman  filter,  developed  by  Rudolf  Kalman  in  1960  [10],  provides  a  method 
for  the  optimal  combination  of  measurements  made  by  multiple  sensors  (e.g.,  inertial 
and  imaging).  The  Kalman  filter  uses  Bayesian  statistics  to  combine  dynamics  and 
measurements  models,  which  provides  a  solution  estimate  with  the  lowest  possible 
uncertainty.  This  section  outlines  the  basic  principles  behind  the  Kalman  filter  as 
outlined  by  Maybeck  [13]  [14]. 

2.6.1  Linear  Kalman  Filter.  The  physical  system  dynamics  are  modeled 
using  the  form 

~k.it)  =  Fx(f)  +  Bu(f)  +  Gw(f)  (2.46) 

where  x  is  a  vector  containing  the  system  states  of  interest,  u  is  a  vector  containing 
system  control  inputs  and  w  is  a  vector  of  white  Gaussian  noise  sources  with 

E[w(t)]  =  0  (2.47) 

E[w(f)w(f  +  t)\  =  Q6(t)  (2.48) 

while  the  matrices  F.  B  and  G  contain  constant  coefficients,  which  specify  linear 
combinations  of  the  vectors  they  multiply. 

In  order  to  implement  the  Kalman  filter  algorithm  in  a  computer  system,  the 
continuous-time  model  must  be  discretized  to  account  for  system  propagation  between 
samples.  The  discrete  process  noise  strength  matrix  and  the  discrete  control 
input  matrix  are  obtained  by  changing  the  limits  of  integration  to  capture  a  single 
time  step  At  within  the  general  solution  to  the  system,  which  is  given  shown  by 
Maybeck  [14]  and  VanLoan  [21].  Additionally,  the  discrete  state  transition  matrix, 
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which  is  used  to  propagate  system  states  and  derived  from  the  system  dynamics 
model,  is  given  by 


$  =  eFAt  (2.49) 

Linear  discrete  measurements  from  the  various  sensors  are  modeled  by 

z  (U)  =  Hx(tj)  +  vfc)  (2.50) 

where  z  is  a  vector  of  sensor  measurements  and  v  is  a  vector  of  discrete-time,  white 
Gaussian  noise  sources  with 


E[v(tj)]  =  0  (2.51) 

E  [v(tj)v(tj)]  =  R ,Si:j  (2.52) 

and 

E[w(tj)v(tj)]  =  0  (2.53) 

while  the  matrix  H  contains  constant  coefficients,  which  specify  linear  combinations  of 
the  system  state  vector  x.  Since  the  system  is  fully  linear,  the  Kalman  filter  algorithm 
guarantees  a  Minimum  Mean  Squared  Error  (MMSE)  optimal  solution  for  estimating 
the  system  states. 

The  quantities  of  interest  estimated  by  the  Kalman  filter  are  contained  within 
the  random  vector  x.  The  Kalman  filter  provides  the  probability  density  function  for 
x  at  each  discrete  time  step,  conditioned  on  noise  corrupted  measurements  provided 
by  sensors.  The  Kalman  filter  algorithm  begins  with  initial  conditions,  which  include 
the  initial  state  estimate  vector  x  and  its  uncertainty,  which  is  contained  by  the 
covariance  matrix  Pxx.  The  initial  conditions  are  propagated  from  one  discrete  time 
step  to  the  next  using  the  discrete  state  transition  matrix  such  that 
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x(ti+1)  =  $x(X+)  +  Bdu(ti) 

P  xx(t“+1)  =  #Pxx(t+)$T  +  Qd 


(2.54) 

(2.55) 


As  linear  measurements  become  available  at  discrete  time  intervals,  the  propa¬ 
gated  state  estimates  and  their  covariance  are  optimally  combined  with  the  incoming 
measurements  using  the  Kalman  gain  matrix  K,  which  is  given  by 

K  (U)  =  Pxx(t-)HT[UPxx(tr)UT  +  R]'1  (2.56) 

The  state  estimates  and  covariances  are  updated  with  the  Kalman  gain  matrix 
from  Equation  (2.56)  using 


x(t+)  =  x(f.  )  +  K(fj)[z(f,;)  -  Hx(t.  )]  (2.57) 

P^(^+)  =  P xx(t~)  -  K(ij)HPxx(i“)  (2.58) 

As  shown  in  Equations  (2.57)  and  (2.58),  the  Kalman  gain  matrix  serves  as  an 
optimal  weighting  factor  that  gives  adequate  preference  to  either  the  propagated  or 
measured  estimates,  given  their  individual  uncertainties,  in  order  to  minimize  mean 
squared  error. 

2.6.2  Extended  Kalman  Filter.  If  a  particular  system  cannot  be  adequately 
represented  using  linear  dynamics  or  measurement  models,  the  linear  Kalman  filter 
algorithm  does  not  guarantee  optimal  solutions.  However,  in  certain  cases,  linear 
approximations  to  nonlinear  systems  can  still  yield  accurate  estimates.  In  such  cases, 
the  Extended  Kalman  Filter  (EKF)  is  used.  The  basic  system  dynamics  equation  for 
a  nonlinear  system  is  given  by 

x(f)  =  f  [x(t),  u(t),  t]  +  G(f)w(f)  (2.59) 
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where  f  is  a  vector  containing  functions  which  represent  the  system.  In  turn,  the 
nonlinear  measurement  equation  is  given  by 

z  (U)  =  h[x(tj),  ti\  +  w{ti)  (2.60) 


where  h  is  a  vector  of  functions  which  model  the  sensor.  The  main  goal  is  to  linearize 
nonlinear  models  about  their  nominal  estimates  in  order  to  use  the  conventional  linear 
Kalman  update  equations.  To  do  so,  the  states  are  redefined  using  the  perturbation 
model  given  by 

hx(f)  =  x(f)  —  x(f)  (2-61) 

where  <5x(f)  represents  the  difference  between  the  true  state  vector  and  its  estimate. 
In  order  to  propagate  the  system  from  initial  conditions  or  a  previous  measurement 
to  the  time  of  the  next  measurement,  the  EKF  integrates  the  nonlinear  dynamics 
function  over  the  discrete  time  difference  using 

x(t"+1)  =  /  f[x(f),u(f),f]df +  x(f+)  (2.62) 


while  the  state  covariance  matrix  is  propagated  using  Equation  (2.49),  Equation  (2.55) 
and  a  linearized  dynamics  model  matrix  F  given  by 


F(U) 


(2.63) 


In  order  to  update  the  propagated  state  estimates  using  an  incoming  (possibly 
nonlinear)  measurement,  the  measurement  must  first  be  predicted  by  evaluating  the 
measurement  model  function  with  the  most  recent  estimate  using 


z pred(ti)  =  h[x(f.  ),ti]  (2.64) 

hz(tj)  z(tj)  Z  pred(ti)  (2.65) 
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where  Sz  is  called  the  measurement  perturbation  and  represents  the  difference  between 
the  actual  and  predicted  measurements. 

In  order  to  combine  the  propagated  and  measured  state  estimates,  the  nonlinear 
measurement  function  h  is  linearized  to  obtain  H(U)  in  a  similar  fashion  to  F (U)  using 

(2.66) 

*(*r ) 

The  linearized  matrix  H  is  then  used  in  Equation  (2.56)  to  obtain  a  Kalman 
gain  matrix,  and  the  measurement  update  equation  reduces  to 
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6Z(tl+1)  =  K(ti+1)8z(ti+1)  (2.67) 

due  to  the  use  of  perturbation  state  estimates  and  measurements.  The  perturbation 
state  <5x,  which  starts  at  zero  during  each  filter  recursion,  is  updated  using  Equa¬ 
tion  (2.67)  and  added  to  the  nominal  trajectory  to  produce  a  nominal  estimate.  Prior 
to  the  next  recursion,  the  perturbation  state  is  reset  to  zero.  Since  the  EKF  does  not 
guarantee  optimal  solutions,  it  must  often  be  tuned  prior  to  use  by  adding  process 
noise  and  selecting  specific  initial  conditions.  Tuning  increases  filter  stability  and 
usually  increases  solution  uncertainty. 

2.7  Image-aided  Navigation 

Veth  [22]  describes  a  novel  approach  to  image-aided  navigation  in  which  features 
are  tracked  within  an  image  over  time  in  order  to  correct  inertial  navigation  errors. 
At  the  same  time,  inertial  sensor  measurements  are  used  to  correct  errors  within  the 
visual  feature  tracking  algorithm.  Using  a  so-called  stochastic  feature  tracker,  Veth 
presents  a  deeply-coupled  inertial  and  visual  navigation  algorithm  that  uses  rigorous 
stochastic  developments  in  order  to  integrate  the  two  systems. 
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Table  2.1:  Image-aided  navigation  parameters. 

This  table  summarizes  the  major  parameters  in  Veth’s 
image- aided  navigation  algorithm  [22], 


Parameter 

Description 

pn 

Vehicle  position  in  navigation  frame 

vn 

Vehicle  velocity  in  navigation  frame 

^ b 

Vehicle  body  to  navigation  frame  DCM 

ab 

Accelerometer  bias  vector 

bb 

Gyroscope  bias  vector 

\.n 

Lra 

Location  of  landmark  m  in  the  navigation  frame 

db 

Camera-to-body  lever  arm  in  body  frame 

cb 

Camera-to-body  orientation  DCM  in  body  frame 

y 

Vector  of  landmark  locations  in  navigation  frame 

2.7.1  Algorithm  Description.  The  major  system  parameters  involved  in  the 
algorithm  are  listed  in  Table  2.1.  Veth  uses  an  EKF  to  periodically  update  the  error 
estimates  in  the  system  parameters,  which  can  be  grouped  into  navigation  parame¬ 
ters  (position,  velocity,  attitude),  inertial  sensor  biases  and  a  vector  (y)  describing 
the  location  of  the  landmarks  of  interest.  Throughout  the  EKF,  navigation  param¬ 
eters  are  calculated  using  bias-corrected  inertial  measurements  (vehicle  velocity  and 
angular  increment)  and  used  to  propagate  the  error  estimates  through  the  strapdown 
mechanization  equations  described  by  Titterton  and  Weston  [20].  The  navigation  pa¬ 
rameters  are  modeled  as  stochastic  random  processes  while  the  inertial  sensor  biases 
are  modeled  as  first-order  Gauss-Markov  processes,  which  are  based  on  manufacturer 
specifications.  Finally,  the  landmarks  used  in  visual  feature  tracking  are  modeled  as 
stationary  objects  with  respect  to  the  Earth,  with  a  small  amount  of  process  noise 
added  for  filter  stability.  Figure  2.17  illustrates  Veth’s  overall  algorithm. 

2.7.2  Landmark  Track  Maintenance.  Due  to  computer  limitations  and  on¬ 
line  implementation  practicalities,  Veth  constrains  the  number  of  features  that  are 
tracked  at  any  particular  time  through  the  use  of  a  track  maintenance  algorithm. 
In  general,  a  maximum  and  minimum  number  of  tracks  is  set,  and  features  are  con¬ 
stantly  “pruned”  in  order  to  provide  the  EKF  the  best  information  possible.  In  his 
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Figure  2.17:  The  3-D  locations  of  stationary  features  are  tracked  and  used 
to  provide  measurement  updates  in  order  to  estimate  and  correct  inertial  navi¬ 
gation  errors.  In  turn,  the  inertial  navigation  system  is  used  to  support  feature 
tracking  [22], 

algorithm,  Veth  uses  SIFT  to  generate  features  of  interest  because  they  are  easy  to 
identify,  locally  distinct  from  other  features  and  well  separated  in  image-space.  The 
main  purpose  of  the  track  maintenance  algorithm  is  to  remove  “stale”  tracks  when 
no  correspondence  is  found  for  extended  periods  of  time. 

2.7.3  Measurement  Model.  One  of  the  most  crucial  developments  in  Veth’s 
algorithm  [22]  is  the  use  of  rigorous  stochastic  projections  to  propagate  features  be¬ 
tween  images.  Veth  describes  a  tracking  loop  algorithm  responsible  for  incorporat¬ 
ing  new  landmark  tracks,  predicting  and  matching  feature  locations  between  images 
through  the  use  of  stochastic  projections  and  providing  actual  filter  measurements  to 
the  EKF.  In  turn,  the  EKF  assists  the  tracking  loop  by  providing  mean-square  error 
state  estimates  from  which  stochastic  projections  are  derived. 

The  tracking  loop  incorporates  new  tracks  (as  necessary)  by  estimating  the  3-D 
location  of  the  feature  using  a  combination  of  binocular  stereopsis  and  the  current 
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Figure  2.18:  Features  of  interest  are  propagated  into  future  images  using 

inertial  measurements  and  stochastic  projections  [22], 

navigational  state  vector.  The  vector  containing  feature  locations,  along  with  its 
associated  covariance  matrix,  is  augmented  into  the  EKF  using  the  system’s  stochastic 
properties  [23] .  As  the  EKF  propagates  the  state  estimate,  the  location  of  the  features 
in  a  future  image  are  predicted  along  with  an  uncertainty  ellipse,  which  is  derived 
from  the  propagated  covariance  matrix.  Finally,  the  landmark  correspondence  search 
can  be  restricted  based  upon  the  pix- frame  projection  of  each  landmark’s  location 
uncertainty  in  order  to  speed  up  the  tracking  process.  Once  (and  if)  a  match  is 
found,  the  pixel  location  of  the  matched  features  is  used  to  update  the  navigation 
state.  Figure  2.18  illustrates  the  stochastic  projection  process. 

2.8  Summary 

This  chapter  has  laid  the  basic  foundation  upon  which  this  thesis  is  based.  Top¬ 
ics  including  nomenclature,  reference  frames,  inertial  navigation,  digital  image  pro¬ 
cessing,  camera  calibration,  Kalman  filtering  and  image-aided  navigation  have  been 
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explored.  The  next  chapter  of  this  research  describes  how  these  concepts  were  used  to 
develop  an  improved  image-aided  navigation  algorithm  that  includes  automatic  cam¬ 
era  calibration  and  accounts  for  affine  distortions  such  as  those  presented  by  cameras 
with  non-overlapping  fields  of  view. 
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III.  Methodology 


This  chapter  presents  the  methods  used  to  generate  a  solution  to  the  problems 
described  in  Chapter  I,  as  well  as  the  experiments  used  to  evaluate  the  solution. 
This  chapter  is  divided  into  two  sections:  algorithm  development  and  experimental 
methods.  The  algorithm  development  for  automatic  calibration  is  presented  in  Section 
3.1.1.  The  affine  distortion  prediction  process  is  presented  in  Section  3.1.2.  Finally, 
the  experimental  methods  used  to  validate  the  major  research  contributions  of  this 
thesis  are  presented  in  Section  3.2. 

3.1  Algorithm  Development 

3.1.1  Automatic  Calibration  .  This  section  describes  the  procedures  used 
to  generate  an  automated  calibration  algorithm  based  on  the  manual  calibration  pro¬ 
cesses  described  in  Section  2.5.  The  automatic  calibration  algorithm  developed  in  this 
section  removes  the  need  for  user  interaction  and  complements  the  existing  Matlab® 
toolbox. 

As  previously  stated,  the  current  calibration  process  relies  heavily  on  user  in¬ 
volvement.  Although  current  computer  vision  algorithms  are  very  efficient  at  recog¬ 
nizing  corners  within  an  image  [6],  there  are  currently  no  mature  methods  for  sepa¬ 
rating  a  standard  calibration  board,  such  as  the  one  in  Figure  1.2,  from  the  image 
background.  Because  of  this,  the  current  calibration  process  starts  off  by  a  manual 
delineation  of  the  calibration  board  perimeter  for  every  calibration  image.  Addition¬ 
ally,  the  user  is  asked  for  the  size  of  each  square  and  the  number  of  squares  along  each 
axis.  As  the  user  defines  the  board  perimeter  within  each  image,  the  toolbox  builds 
a  set  of  coordinate  correspondences  between  points  in  the  pix- frame  (m)  and  points 
in  the  caZ-frame  (M).  Once  enough  correspondences  are  established,  the  toolbox  can 
run  through  a  gradient  descent  algorithm  to  find  the  best  solution  for 


m  =  HM 


(3.1) 
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Figure  3.1:  Standard  (manual)  calibration  process.  During  standard  cali¬ 

bration,  the  user  is  heavily  involved  in  the  board  delineation  and  square  size 
input  steps. 

where  the  homography  matrix  H  contains  the  camera  model  and  transformation  ma¬ 
trix  that  takes  points  from  the  caZ-frame  into  the  pix-frame. 

As  illustrated  by  Figures  3.1  and  3.2,  the  manual  steps  from  the  standard  al¬ 
gorithm  (color-coded  red)  are  replaced  with  proven  automated  computer  functions 
(color-coded  blue)  in  order  to  eliminate  the  need  for  user  interaction.  First,  the  stan¬ 
dard  calibration  board  is  replaced  by  an  arbitrary  image  for  which  distinct  SIFT 
features  can  be  found  and  tracked.  SIFT  matches  can  then  be  automatically  found 
between  the  physical  board  and  a  digital  copy  with  no  need  for  user  interaction.  A 
sample  feature  match  set  from  an  experimental  calibration  is  shown  in  Figure  3.3. 

Having  automatically  found  the  pix- frame  points  m  on  the  right  image,  their 
corresponding  caZ-frame  coordinates  can  be  computed  through  a  transformation  ma- 
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Figure  3.2:  Automatic  calibration  process.  During  automatic  calibration, 

user  involvement  is  drastically  reduced  by  changing  the  calibration  board  and 
replaced  by  automatic  feature  recognition. 
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where  the  set  of  points  M  have  coordinates  which  are  measured  in  pixels  and  the 
set  of  points  M  are  measured  in  meters,  as  required  by  the  calibration  toolbox.  It  is 
important  to  note  that  since  all  caZ-frame  points  are  coplanar,  their  third  coordinate 
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Figure  3.3:  Automatic  calibration  matches.  Board  delineation  and  corner 

detection  are  replaced  by  automatic  SIFT  feature  matching  between  an  image 
printout  and  its  digital  copy.  The  left  image  shows  digital  caZ- frame  coordi¬ 
nates  M,  while  the  right  image  shows  the  corresponding  pix- frame  points  m. 

can  be  assumed  to  be  zero  (z  =  0),  which  reduces  the  transformation  matrix  from 
three  to  two  dimensions.  The  coefficients  that  populate  the  transformation  matrix 
in  Equation  (3.2)  are  computed  by  comparing  the  pixel  and  meter  coordinates  of  at 
least  three  known  points  in  a  least-squares  algorithm.  The  corners  of  the  white  square 
shown  in  Figure  3.4  illustrate  four  known  points  selected  in  both  the  digital  board 
and  the  printed  board. 

Having  defined  the  M  and  m  vectors  without  user  interaction,  the  standard 
calibration  toolbox  can  be  used  to  execute  the  gradient  descent  computations  and 
complete  the  calibration  process.  The  results  from  an  automatic  calibration  routine 
are  compared  to  the  standard  calibration  process  in  terms  of  total  time  and  reprojec¬ 
tion  error  in  Chapter  IV. 

3.1.2  Affine  Distortion  Prediction  .  This  section  describes  the  methods  and 
algorithms  used  to  develop  a  predictive  algorithm  that  accounts  for  affine  distortions 
in  SIFT  feature  descriptors,  which  arise  from  changes  in  3-D  viewpoint.  The  pre¬ 
dictive  algorithm,  referred  to  from  here  on  as  Affine  Distortion  Prediction  (ADP),  is 
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Figure  3.4:  Pixel-to-board  frame  mapping.  In  order  to  convert  digital  cal- 
frarne  coordinates  (pixels)  to  actual  printed  cai-frame  coordinates  (meters)  for 
all  automatic  SIFT  matches  found,  a  transformation  matrix  is  constructed  by 
comparing  the  digital  and  printed  caZ-frame  coordinates  of  four  known  points 
(corners  of  the  white  square). 

then  deeply  coupled  into  the  existing  Image-aided  Extended  Kalman  Filter  (IAEKF), 
originally  developed  by  Veth  [22], 

3. 1.2.1  Effects  of  Affine  Distortion  on  Feature  Matching.  In  order  to 
fully  understand  the  need  for,  and  advantages  provided  by  ADP,  the  effects  of  affine 
distortions  in  feature  tracking  effectiveness  must  be  examined.  As  previously  stated, 
one  of  the  key  requirements  in  image-aided  navigation  is  the  ability  to  detect  and 
track  features  over  many  frames  of  an  image  sequence.  Although  SIFT  descriptors 
allow  feature  matching  through  changes  in  scale  (zoom),  rotations  about  the  c- frame’s 
2-axis  (2-D  rotations),  and  illumination,  feature  tracking  effectiveness  diminishes  with 
rotations  about  the  c-frame’s  x  and  y  axes  (3-D  rotations)  [18].  Figure  3.5  illustrates 
a  3-D  rotation  between  two  images  of  the  same  scene.  Using  the  conventional  SIFT 
descriptors  found  for  both  images  and  a  NNDR  of  0.45,  there  was  only  one  positive 
match  found.  Next,  Figure  3.6  illustrates  a  drastic  increase  in  SIFT  matches  between 
the  two  images  when  an  affine  transformation  is  artificially  applied  to  the  original  flat 
image.  Finally,  Figure  3.7  illustrates  the  locations  of  the  matches  found  in  Figure  3.6 
backprojected  onto  the  original  flat  image  to  produce  the  final  match  set. 
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3. 1.2. 2  Simulating  Affine  Distortions.  Having  established  the  poten¬ 
tial  improvement  offered  by  simulating  affine  distortions,  the  mathematics  involved 
can  be  developed.  The  process  begins  with  a  flat,  grayscale  digital  image  I  with  height 
M  and  width  A.  The  grayscale  value  stored  in  the  (i,j)  element  of  the  matrix  I  can 
be  thought  of  as  a  2-D  point  with  coordinates  (j  + 1,  i  +  1)  in  the  pix-frame.  An  affine 
distortion  to  I  can  be  simulated  by  applying  a  3-D  DCM  A)  about  the  image  center 
to  every  element  in  I  such  that 
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where  the  new  elements  (i,j)  compose  the  simulated  image  I,  which  is  the  resulting 
affine-distorted  version  of  I.  Note  that  in  order  to  apply  a  3-D  transformation  matrix 
onto  a  2-D  image,  a  third  dimension  must  be  created  for  every  element  in  I.  Since  all 
the  elements  of  I  are  assumed  to  be  coplanar,  a  third  coordinate  of  z  =  0  can  be  set  for 
all  points  in  the  image  prior  to  applying  A \.  The  DCM  A/  can  be  built  from  desired 
changes  in  line-of-sight  azimuth  (if)  and  elevation  (6)  using  Equation  (2.2).  Since 
SIFT  descriptors  are  already  invariant  to  rotations  about  the  line-of-sight  vector,  <f 
is  always  set  to  zero. 

Simulating  an  affine  distortion  to  the  original  image  is  essentially  equivalent  to 
resampling  I  along  the  major  axes  of  A).  Therefore,  the  final  step  in  this  process 
consists  of  applying  a  low-pass  2-D  Gaussian  filter  on  I  to  prevent  undesired  aliasing. 
A  2-D  low-pass  Gaussian  filter  can  be  applied  to  an  image  X  using 


Xfilt  =  X  ©  h(ni,n2) 

(3.6) 

,/  x  hg(nlln2) 

h(ni,n2)  = 

nl  n  2 

(3.7) 

hg(nun2)  =  e-G?+«i)/(2-2) 

(3.8) 
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where  ri\  and  n2  represent  the  dimensions  of  the  Gaussian  filter  and  a  represents  its 
standard  deviation.  Figure  3.8  illustrates  the  simulated  affine  distortion  output  image 
for  four  different  combinations  of  change  in  line-of-sight  azimuth  and  elevation. 

3. 1.2. 3  Transforming  SIFT  Descriptors.  With  a  method  for  simulat¬ 
ing  affine  distortions  in  place,  a  process  for  transforming  the  SIFT  descriptor  of  a 
particular  feature  can  be  developed.  Transforming  the  SIFT  descriptor  of  a  specific 
feature  differs  from  the  process  shown  in  Section  3. 1.2.1  and  Figure  3.6.  Specifically, 
one  is  no  longer  simulating  an  affine  distortion  on  the  input  image  and  checking  for 
all  possible  matches,  but  rather  trying  to  obtain  a  new  (affine-distorted)  descriptor 
for  a  single  feature.  The  reason  for  this  difference  is  that  in  order  to  integrate  ADP 
into  the  IAEKF,  SIFT  feature  descriptors  need  to  be  “propagated”  to  account  for 
changes  in  relative  viewpoint  between  the  vehicle  and  the  features  it  tracks. 

The  entire  SIFT  descriptor  transformation  process  is  illustrated  in  Figure  3.9. 
Again,  the  process  begins  with  a  flat  input  image  I,  a  feature  of  interest  fm,  which  is 
described  by  pix- frame  coordinates  (xm,ym),  SIFT  scale  am,  SIFT  rotation  rm  and 
SIFT  descriptor  dm.  The  goal  is  to  use  a  desired  change  in  azimuth  (A0)  and  elevation 
(A 6)  in  order  to  transform  dm  and  produce  drn. 

The  first  step  is  to  simulate  the  affine  distortion  dictated  by  Af  and  A 6  on  I 
using  Equation  (3.5).  Since  the  affine  distortion  simulation  equation  applies  to  all 
points,  it  can  be  used  to  project  the  coordinates  of  fm  ( xm ,  ym,  0)  along  with  the  rest 
of  the  image.  The  affine  distortion  simulation  process  significantly  alters  the  scale 
and  rotation  of  the  entire  image,  which  changes  both  rm  and  am.  This  change  makes 
it  impossible  to  manually  compute  a  new  SIFT  descriptor  from  I,  which  is  the  affine- 
distorted  version  of  I,  at  fm,  which  is  the  projected  location  of  fm  in  the  new  image. 
Instead,  the  SIFT  algorithm  described  in  Section  2.4.2  is  applied  to  I  to  produce 
a  new  set  of  SIFT  features  G.  Finally,  the  nearest  neighbor  in  G  to  fm  is  found 
using  Euclidean  distances  and  called  gn.  It  is  then  assumed  that  if  the  threshold  for 
accepting  a  close  neighbor  match  is  low  enough,  dm  can  be  taken  directly  from  the  new 
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feature.  That  is,  the  “transformed”  descriptor  for  fm  is  actually  the  descriptor  from 
the  new  feature  gn.  If  a  new  feature  is  not  found  within  the  established  threshold, 
the  SIFT  descriptor  is  said  to  be  non-transformable  for  the  given  A <f>  and  A 6  and 
the  original  descriptor  is  returned.  Figures  3.10  through  3.12  illustrate  the  descriptor 
transformation  algorithm  described  in  this  section. 

3. 1.2.4  Deep  Coupling  into  IAEKF.  Having  developed  an  algorithm 
for  transforming  SIFT  descriptors  using  desired  changes  in  azimuth  and  elevation, 
ADP  can  now  be  integrated  into  the  IAEKF  in  order  to  improve  long-baseline  feature 
tracking  performance.  The  first  step  involves  understanding  the  nature  of  changes  in 
visual  information  through  vehicle  movement.  As  shown  in  Figure  3.13,  assuming  a 
forward- moving  trajectory,  visual  elements  which  initially  are  characterized  by  high 
affine  and  projective  distortions  become  “ffatter”  as  the  vehicle  approaches.  That  is, 
in  general,  additional  information  about  particular  objects  is  gained  in  a  scene  as  one 
gets  closer  to  them. 

A  similar  characteristic  can  be  attributed  to  the  affine  distortion  simulation  al¬ 
gorithm  described  in  3. 1.2.2.  Note  that  affine  distortion  works  by  resampling  initial 
visual  information,  which  actually  reduces  the  amount  of  information,  to  approximate 
the  effects  of  high  affine  change.  Affine  distortion  simulation  can  only  distort  currently 
available  visual  information,  and  more  importantly,  it  cannot  recover  missing  informa¬ 
tion.  In  other  words,  affine  distortion  simulation  cannot  be  used  to  “flatten”  images 
that  are  already  distorted;  it  can  only  further  distort  images. 

From  the  above  observations,  one  can  conclude  that  in  order  to  use  ADP  ef¬ 
fectively,  a  SIFT  descriptor  cannot  be  simply  propagated  forward  because  the  visual 
information,  which  was  hidden  by  the  high  affine  distortion,  cannot  be  recovered. 
Instead,  newer  measurements  (sets  of  descriptors)  must  be  propagated  backwards. 
Therefore,  newer  descriptors,  which  are  assumed  to  be  “flatter”  and  contain  more  in¬ 
formation,  need  to  be  transformed  to  match  current  descriptors,  which  are  assumed  to 
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be  distorted  and  contain  less  information.  Figure  3.14  illustrates  the  visual  operator 
space  problem  described  in  this  section. 

Having  established  the  correct  application  of  ADP  onto  SIFT  descriptors,  the 
mechanics  involved  in  the  deep  coupling  with  the  IAEKF  can  be  finally  set.  Fig¬ 
ure  2.17  describes  the  baseline  deep  coupling  algorithm  described  by  Veth  [22],  Inte¬ 
grating  ADP  into  the  existing  visual  aided  navigation  algorithm  can  be  thought  of  as 
not  only  predicting  where  (in  the  pix- frame)  features  will  appear  from  ti  to  t*+i,  but 
also  what  (in  terms  of  SIFT  descriptor  changes)  those  feature  will  look  like  at  f,;+1. 

The  algorithm  begins  with  the  binocular  initialization  of  K  landmarks  at  tt.  As 
previously  mentioned,  the  binocular  initialization  is  necessary  for  estimating  feature 
depth  and  is  only  needed  when  discovering  new  landmarks.  After  initializing  a  par¬ 
ticular  landmark  m  and  estimating  its  3-D  location  in  the  n-frame  t^,  the  landmark’s 
initial  descriptor  dm{tj)  and  relative  azimuth  'ipm(ti)  and  elevation  6m{ti )  are  recorded. 
The  landmark’s  relative  azimuth  and  elevation  are  computed  using 
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where  the  n-frame  vector  l”m  is  a  line-of-sight  vector  pointing  from  the  vehicle  to 
the  landmark  m,  and  the  vectors  t"  and  pn  represent  the  landmark  and  vehicle 
positions  in  the  n-frame  respectively.  In  short,  relative  azimuth  is  the  angle  between 
the  projection  of  l"m  onto  the  N-E  plane  and  the  IV-axis,  while  relative  elevation  is  the 
angle  between  l”m  and  the  N-E  plane.  Additionally,  the  computed  relative  azimuth  is 
modified  from  Equation  (3.10)  by  adding  or  subtracting  7 r  to  ensure  continuity  across 
the  entire  2n  circle.  Figure  3.15  illustrates  the  azimuth  and  elevation  angles  as  defined 
by  the  n-frame.  Note  that  relative  azimuth  and  elevation  are  purely  functions  of  the 
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relative  position  between  the  vehicle  and  the  features  it  tracks;  vehicle  attitude  is  not 
a  factor. 

Next,  using  the  mathematics  described  by  Veth  [22],  each  of  the  K  landmarks  is 
propagated  forward  in  time  from  tt  to  U+i.  The  pix- frame  location  of  each  landmark 
is  estimated  using  INS  measurements.  A  stochastically-defined  elliptical  search  area 
around  each  estimated  landmark  location  is  created  and  based  on  state  covariances 
as  shown  in  Figure  1.3.  Before  attempting  to  match  candidate  features  from  the 
image  at  0+1  (referred  to  as  measurements)  to  landmark  descriptors  from  0,  the 
ADP  algorithm  is  used  to  propagate  dm{tj)  to  (fi+i)  for  all  K  landmarks.  In  order 
to  do  so,  the  most  recent  computed  position  solution  is  used  to  calculate  0m0i+i)  and 
Omiti+i),  for  each  of  the  K  landmarks  being  tracked. 

Having  calculated  the  new  relative  azimuth  and  elevation  for  the  rnth  landmark, 
A '0m  and  A 9m  are  computed  using 


A 0m  =  4>m(ti+ 1)  -  0m (tf )  (3.12) 

A 9m  =  9,m(t~+l )  -  §m(tt)  (3.13) 

The  SIFT  descriptor  transformation  algorithm  described  in  Section  3. 1.2.3  is 
then  used  to  back-propagate  all  candidate  features  from  ti+i  to  0  as  illustrated  in 
Figure  3.14  while  keeping  track  of  their  corresponding  original  SIFT  descriptors.  Fi¬ 
nally,  a  NNDR  algorithm  is  used  to  identify  a  match  between  the  particular  landmark 
and  all  back-propagated  candidate  features  that  fall  within  the  search  window.  If  a 
positive  match  is  found,  iti+i)  is  then  defined  as  the  transformed  SIFT  descriptor 
that  was  positively  matched  while  d+(fi+i)  is  defined  as  the  corresponding  original 
descriptor.  This  process  is  repeated  for  all  K  landmarks  active  at  0.  The  entire  ADP 
algorithm  is  illustrated  in  Figure  3.16. 
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3. 1.2.5  Optimizing  Simplifications.  Not  all  magnitudes  of  Afim  and 
A 9m  warrant  the  use  of  ADP.  There  is  a  range  of  change  in  either  relative  azimuth  or 
elevation  that  can  be  neglected  clue  to  its  minimal  effect  on  SIFT  descriptors.  This 
range  was  experimentally  found  to  be  0  —  35  degrees  in  both.  Additionally,  it  was 
experimentally  established  that  distortion  simulations  exceeding  60  degrees  in  either 
azimuth  or  elevation  result  in  SIFT  descriptor  transformation  success  rates  below  5% 
and  therefore  can  be  neglected. 

The  ADP  algorithm  is  repeated  K  times  every  discrete  time  step,  often  with 
very  similar  affine  distortions.  Therefore,  measurement  back-projections  involving 
similar  Afi  and  A 6  combinations  for  a  particular  time  step  can  be  grouped  into  a 
single  back-projection  mode  in  order  to  eliminate  computational  redundancy.  Fig¬ 
ure  3.17  illustrates  a  sample  back-projection  set  (A-0,  A 9)  and  the  modes  they  were 
grouped  into.  Since  back-projection  involves  transforming  an  image  in  accordance 
with  a  particular  (A-0,A 6),  many  features  may  be  back-projected  using  the  same 
transformed  image. 

3. 1.2.6  Additional  Enhancements  to  the  IAEKF.  Finally,  several  im¬ 
portant  enhancements  were  implemented  into  the  existing  IAEKF  software: 

Side  Landmark  Manager:  In  order  to  track  landmarks  over  extended  periods  of 
time  using  side  cameras,  landmark  coast  time,  which  is  the  amount  of  time  a 
landmark  is  allowed  to  remain  active  in  the  absence  of  positive  matches,  has  to 
be  lengthened  so  that  landmarks  remain  active  while  they  transition  between 
front  and  side  cameras.  However,  a  longer  landmark  coast  time  increases  the 
wait  time  required  to  replace  a  stale  track.  This  is  exacerbated  under  condi¬ 
tions  in  which  the  forward-looking  scene  changes  quickly  (e.g.,  during  horizon¬ 
tal  turns).  Ultimately,  a  longer  coast  time  increases  the  probability  of  losing 
all  landmark  tracks  without  replacement,  which  effectively  eliminates  the  mea¬ 
surement  updates  provided  by  the  feature  tracker  to  the  IAEKF.  This  conflict 
between  landmark  coast  time  and  feature  tracker  effectiveness  was  solved  by 
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adding  a  second  feature  tracker,  which  receives  landmark  hand-offs  from  the 
standard  feature  tracker  as  it  deactivates  stale  features.  The  second  feature 
tracker  allows  newly  deactivated  landmarks  to  still  provide  measurement  up¬ 
dates,  if  a  positive  match  is  made  through  a  side  camera,  while  maintaining  a 
relatively  short  landmark  coast  time. 

SIFT  Descriptor  Update  :  In  the  original  implementation  of  the  IAEKF,  the  fea¬ 
ture  tracker  used  the  SIFT  descriptors  from  initial  landmark  activations  to  de¬ 
termine  matches.  If  a  positive  match  was  found,  the  SIFT  descriptor  was  not 
replaced  by  its  match,  which  is  by  definition  newer.  This  process  was  changed 
so  that  SIFT  descriptors  for  matched  landmarks  are  replaced  by  their  latest 
match.  This  frequent  SIFT  descriptor  update  increases  the  probability  of  track¬ 
ing  landmarks  over  extended  periods  of  time  since  it  reduces  the  relative  change 
between  reference  and  candidate  SIFT  descriptors.  Additionally,  this  frequent 
descriptor  update  can  be  used  to  account  for  descriptor  changes  arising  from 
small  incremental  changes  in  viewpoint. 

Lens  Distortion  Corrections:  Finally,  the  original  IAEKF  implementation  only 
accounted  for  lens  distortion  effects  when  correcting  landmark  locations  on  the 
pix-frame.  However,  lens  distortion  was  not  used  to  account  for  changes  in 
feature  appearance  as  a  function  of  pix-frame  location.  It  is  logical  to  assume 
that  the  SIFT  descriptor  for  a  particular  feature  changes  significantly  depend¬ 
ing  on  the  feature’s  location  within  the  pix-frame  due  to  radial  lens  distortion 
effects.  Therefore,  in  order  to  track  features  for  extended  periods  of  time,  and 
more  importantly,  as  they  change  locations  within  the  pix-frame,  all  incoming 
measurement  images  were  undistorted  using  the  camera  model  from  calibra¬ 
tion  prior  to  detecting  SIFT  features.  This  image  undistortion  step  increases 
the  probability  of  obtaining  a  positive  landmark  match  even  as  the  landmarks 
change  position  within  the  pix-frame. 
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3.2  Experimental  Methods 

This  section  describes  the  experimental  methods  used  to  evaluate  the  automatic 
calibration  and  affine  distortion  prediction  algorithms  developed  in  Section  3.1.  The 
topics  discussed  include  test  equipment,  algorithm  variables,  sensor  models  and  data 
collection  run  descriptions. 

3.2.1  Test  Equipment.  The  main  equipment  used  to  experimentally  ver¬ 
ify  both  algorithms  includes  a  set  of  3  consumer-grade  computer-vision  cameras 
from  Prosilica  and  the  MEMS-grade  MIDG  II  IMU  from  Microbotics.  Addition¬ 
ally,  portable  power  and  processing  (computer)  hardware  was  necessary  to  mobilize 
the  test  setup  and  collect  outdoor  data.  Finally,  the  tactical-grade  SPAN  INS/GPS 
system  from  Novatel  was  used  in  conjunction  with  the  main  test  equipment  in  order 
to  collect  high-fidelity  truth  data  for  error  computations.  Figures  3.18  and  3.19  il¬ 
lustrate  the  two  collection  rigs  used  during  this  research.  The  automatic  calibration 
algorithm  was  tested  by  calibrating  a  pair  of  consumer-grade  cameras  using  the  au¬ 
tomatic  algorithm  and  comparing  the  results  with  Bouguet’s  standard  method.  The 
ADP  algorithm  was  tested  by  collecting  outdoor  inertial  and  image  data  using  the 
two  collection  rigs  and  comparing  the  IAEKF  output  results  with  the  high-fidelity 
truth  source. 

3.2.2  Algorithm  Variables.  In  order  to  properly  test  the  ADP  algorithm, 
the  standard  variables  needed  to  run  the  IAEKF  were  tuned.  Table  3.1  summarizes 
all  the  variable  values  needed  to  repeat  the  experiments  performed  for  this  research. 
The  standard  variables  used  in  Veth’s  code  [22]  as  well  as  newly-added  ADP  variables 
are  included.  The  algorithm  variables  are  described  below. 

Coast  time  :  Time  an  active  track  is  allowed  to  remain  stale. 

Max  landmarks  :  Maximum  amount  of  tracks  allowed  at  any  moment. 

ADP  groups  :  Minimum,  maximum  and  separation  of  ADP  mode  groups. 
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Table  3.1:  Variable  values  for  IAEKF  algorithm. 

This  table  summarizes  the  variable  values  associated 
with  the  IAEKF  implementation  used  for  this  re¬ 
search. 


Variable 

Value 

Coast  time  (front) 

1  [s] 

Coast  time  (side) 

3  [s] 

Max  landmarks  (front) 

30 

Max  landmarks  (side) 

30 

ADP  groups 

±  0:5:60  [deg] 

ADP  threshold 

40  [deg] 

Minimum  feature  scale 

3 

Minimum  correlation 

0.95 

SIFT  match  NDDR 

0.45 

Pixel  error  STD 

2  [pix] 

SIFT  rotation  error  STD 

0.5  [pix] 

SIFT  scale  error  coefficient 

0.05  [pix] 

Radial  error  coefficient 

0.005  [pix] 

Stochastic  constraint  distance 

2.15  [pix] 

ADP  threshold  :  Minimum  viewpoint  change  in  an  ADP  group  for  algorithm  usage. 
Minimum  feature  scale  :  Minimum  SIFT  feature  scale  to  accept  a  new  track. 
Minimum  correlation  :  Minimum  descriptor  correlation  to  accept  match. 

SIFT  match  NNDR  :  NNDR  used  to  accept  feature  matches  in  the  IAEKF. 

Pixel  error  STD  :  Pixel  error  standard  deviation  in  measurement  model. 

SIFT  rotation  error  coefficient  :  Rotation  error  multiplier  in  measurement  model. 
SIFT  scale  error  coefficient  :  Scale  error  multiplier  in  measurement  model. 
Radial  error  coefficient  :  Radial  distortion  error  multiplier  in  measurement  model. 
Stochastic  constraint  distance  :  Baseline  distance  for  feature  match  search  area. 


3.2.3  Sensor  Models  and  Installation.  An  accurate  INS  model  was  devel¬ 
oped,  based  on  manufacturer  specifications  and  experimental  data,  in  order  to  ade¬ 
quately  propagate  the  EKF  states  during  processing.  Table  3.2  summarizes  the  main 
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Table  3.2:  MIDG  II  MEMS-grade  INS  model.  This 
table  summarizes  the  MIDG  II  INS  model  variables 
used  for  this  research. 


Variable 

Value 

Sampling  frequency 

Accelerometer  time  constant 
Gyroscope  time  constant 
Accelerometer  bias  STD 
Gyroscope  bias  STD 
Accelerometer  scale  factor 
Gyroscope  scale  factor 
Accelerometer  random  walk  STD 
Gyroscope  random  walk  STD 

50  [Hz] 

3600  [s] 

3600  [s] 

0.2  [m/s2] 

9  x  10”3  [rad/s] 

300  [ppm] 

150  [ppm] 

9  x  10-3  [m/s/y/s] 

9  x  10-4  [rad/y/s] 

characteristics  of  the  INS  model  used  during  this  research.  Table  3.3  summarizes  the 
standard  15-state  Pinson  error  model  [20],  which  was  used  in  conjunction  with  the 
INS  model  in  the  IAEKF.  Finally,  the  automatic  calibration  algorithm  described  in 
this  research  was  used  to  produce  camera  models  for  each  camera  as  summarized  in 
Table  3.4  as  well  as  the  extrinsic  sensor  installation  measurements  summarized  in 
Tables  3.5  and  3.6.  However,  as  it  will  be  later  pointed  out  in  Section  5.2.1,  the 
automatic  camera  calibration  algorithm  does  not  currently  compute  IMU-to-camera 
extrinsic  configurations.  The  extrinsic  configurations  computed  by  the  algorithm  are 
all  relative  to  Camera  1  (front  left).  Therefore,  the  IMU-to-camera  lever  arm  was 
manually  measured  for  Camera  1  and  the  two  sensors  were  assumed  to  be  perfectly 
aligned  in  terms  of  DCMs. 


Table  3.3:  Kalman  filter  state  definitions.  This 

table  summarizes  the  standard  15-state  Pinson  error 
model  [20]. 


State 

Variable 

Meaning 

Xi 

5pN 

North  position  error  [m] 

X2 

5pe 

East  position  error  [m] 

X3 

SpD 

Down  position  error  [m] 

X4 

5vn 

North  velocity  error  [m/s] 

X5 

Sve 

East  velocity  error  [m/s] 

x6 

5vd 

Down  velocity  error  [m/s] 

x 7 

5a 

North  attitude  error  [mrad] 

x8 

513 

East  attitude  error  [mrad] 

Xg 

S'y 

Down  attitude  error  [mrad] 

XlO 

5  f  xs 

x  accelerometer  bias  [m/sec2] 

Xu 

SfVs 

y  accelerometer  bias  [m/sec2] 

X\2 

SfZs 

z  accelerometer  bias  [m/sec2] 

X\3 

5ujxs 

x  gyro  drift  [rad/sec] 

xu 

5ujxs 

y  gyro  drift  [rad/sec] 

Xu 

5ujxs 

z  gyro  drift  [rad/sec] 
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Figure  3.5:  SIFT  matches  through  large  affine  change.  Although  SIFT 

descriptors  allow  for  feature  matching  through  changes  in  illumination,  2-D 
rotation,  and  zoom,  their  matching  effectiveness  quickly  decreases  during  3-D 
rotations.  SIFT  folds  one  match  between  the  two  images. 
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Figure  3.6:  SIFT  matches  with  simulated  affine  distortion.  If  a  3-D  rotation 
approximating  the  right  image  is  applied  to  the  original  left  image,  the  number 
of  SIFT  matches  drastically  increases  to  40. 
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Figure  3.7:  Backprojection  of  match  locations  onto  original  image.  Finally, 
the  match  locations  from  the  simulated  image  can  be  backprojected  onto  the 
original  flat  image  to  recover  the  final  match  locations  between  the  two  original 
images. 
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Figure  3.8:  Sample  affine  distortion  output  images  using  various  combinations  of 

desired  change  in  azimuth  and  elevation.  The  input  image  is  essentially  resampled 
along  the  major  axes  of  the  3-D  DCM  A 
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Figure  3.9:  SIFT  descriptor  transformation  algorithm. 


Figure  3.10:  Sample  flat  image  I  with  selected  SIFT  feature  fm. 
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Figure  3.11:  Sample  affine-distorted  image  with  new  SIFT  features.  Since 
the  image  scale  and  rotation  have  changed,  new  SIFT  descriptors  cannot  be 
manually  computed  at  the  exact  feature  location.  Instead,  the  entire  SIFT 
algorithm  is  used  to  re-detect  a  new  set  of  features  (G)  within  the  new  image. 
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Figure  3.12:  Sample  nearest  neighbor  feature  match.  The  SIFT  descriptor 
for  the  closest  new  feature  gn  to  fm  within  the  predefined  threshold  (2  pixels 
in  this  case)  is  chosen  as  the  “transformed”  descriptor  dm. 


66 


Figure  3.13:  Nature  of  visual  information  gain  over  time.  Initially,  there  is 
very  little  visual  information  available  about  the  face  of  the  object  parallel  to 
the  travel  path  (top).  As  the  vehicle  continues  on  a  straight  and  forward  tra¬ 
jectory,  the  amount  of  visual  information  available  increases  (bottom).  Affine 
distortion  simulation  is  capable  of  taking  the  bottom  image  and  approximating 
the  top  image  but  not  vice-versa. 
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Figure  3.14:  Illustration  of  descriptor  propagation  operator.  Since  the  de¬ 
scriptor  operator  O  cannot  be  applied  to  current  descriptors,  all  incoming 
measurements  are  back-propagated  using  ADP  (CD1).  Descriptor  matching  is 
done  on  the  left  ( “distorted”  7C128  )  and  the  nearest  match  is  followed  back  to 
the  right  (“flat”  7C128). 
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Figure  3.15:  Relative  azimuth  and  elevation  in  the  n-frame. 
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Figure  3.16:  ADP  algorithm  block  diagram. 
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Figure  3.17:  Sample  ADP  mode  grouping. 
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Laptop  computer 
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Front  camera  pair 


HG1700  SPAN 


Mobile  power  supply 


Figure  3.18:  Data  collection  push-cart  illustration.  The  push-cart  collection 
rig  was  manually  pushed  around  a  parking  lot  during  Run  1,  primarily  to 
collect  enough  data  to  tune  the  IAEKF  variables. 
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Front  camera  pair 


MIDGII  IMU 

r  /  ^ 


Computers  and  power 


Figure  3.19:  Data  collection  van  illustration.  The  van-mounted  collection 

rig  was  used  in  a  downtown  environment  during  Run  2  and  was  the  primary 
collection  source  for  this  research. 
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Table  3.4:  Camera  models  table.  This  table  summa¬ 
rizes  the  models  for  the  various  imaging  sensors  used 
for  this  research. 


125495 

122865 

101205 

107971 

102828 

115970 

Make 

Prosilica 

Prosilica 

Prosilica 

Prosilica 

Prosilica 

Prosilica 

Lens 

Pent ax 

Pent ax 

Pent ax 

Pent ax 

Pent ax 

Pent ax 

Focal  length  [mm] 

[1376,1374] 

[1361,1359] 

[1375,1373] 

[928,927] 

[935,933] 

[1106,1105] 

Principal  point  [pix] 

[604,522] 

[611,492] 

[633,532] 

[766,595] 

[797,559] 

[539,404] 

ki 

-0.119 

-0.113 

-0.116 

-0.094 

-0.118 

-0.113 

k2 

-0.188 

0.181 

-0.163 

-0.099 

0.168 

0.16 

k3 

-6.66  x  10-4 

-1.11  x  10"3 

-1.12  x  10-4 

-1.23  x  10"3 

9.04  x  10"3 

3.47  x  10"4 

k4 

1.54  x  10“4 

-1.05  x  10~3 

-1.66  x  10"4 

2.50  x  10"3 

-8.66  x  10~4 

-1.98  x  10”4 

Resolution  [pix] 

[960,1280] 

[960,1280] 

[960,1280] 

[1600,1200] 

[1600,1200] 

[1024,768] 

3.2.4  Data  Collection  Run  Descriptions.  The  automatic  calibration  algo¬ 
rithm  was  tested  indoors  using  a  single  data  collection  and  two  consumer-grade  cam¬ 
eras.  The  ADP  algorithm  was  tested  through  two  different  collection  rigs  (Figures 
3.18  and  3.19),  which  were  used  to  perform  two  different  collection  runs.  Run  1  was 
performed  in  a  parking  lot  using  the  rig  shown  in  Figure  3.18,  while  Run  2  was  per¬ 
formed  in  an  urban  downtown  using  the  rig  shown  in  Figure  3.19.  The  purpose  of  Run 
1  was  to  collect  a  short  amount  of  data  in  order  to  debug  and  tune  the  ADP-enhanced 
IAEKF  algorithm.  Having  properly  tuned  the  system,  Run  2  was  then  performed  as 
the  primary  source  of  data.  It  is  important  to  note  that  the  system  was  not  re-tuned 
for  Run  2.  The  results  from  Run  2  are  more  indicative  of  real-world  system  perfor¬ 
mance,  since  the  system  would  most  likely  not  be  re-tuned  prior  to  runs.  Table  3.7 
summarizes  the  key  characteristics  of  each  run  including  the  location,  duration,  and 
equipment  used. 

3. 3  Summary 

This  chapter  has  presented  the  key  contributions  of  this  research  in  terms  of 
algorithm  development  for  both  the  automatic  calibration  and  affine  distortion  pre¬ 
diction  algorithms.  Additionally,  the  test  equipment  and  experiments  used  to  validate 
the  algorithms  were  described.  The  results  from  these  experiments  are  analyzed  and 
illustrated  in  Chapter  IV. 
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Table  3.5:  Sensor  installation  table  for  Run  1.  This 
table  summarizes  the  (vehicle)  body-to-sensor  transla¬ 
tion  vector  (T£s)  and  sensor-to-body  DCMs  (C(!)  for 
each  sensor  used  in  Run  1.  The  body  ( b )  frame  is 
defined  as  shown  in  Figure  2.3. 


TiM 

C* 

IMU 

0.0000 

0.0000 

0.0000 

1.0000  0.0000  0.0000 

0.0000  1.0000  0.0000 

0.0000  0.0000  1.0000 

Cam  1 

0.0191 

-0.2273 

-0.0080 

0.0000  0.0000  1.0000 

0.0000  1.0000  0.0000 

-1.0000  0.0000  0.0000 

Cam  2 

0.0204 

0.2289 

-0.0135 

-0.0031  -0.0015  1.0000 

0.0029  1.0000  0.0015 

-1.0000  0.0029  -0.0031 

Cam  3 

-0.0864 

0.1692 

-0.0476 

-0.0353  -0.9994  -0.0049 

0.0251  -0.0058  0.9997 

-0.9991  0.0352  0.0253 
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Table  3.6:  Sensor  installation  table  for  Run  2.  This 
table  summarizes  the  (vehicle)  body-to-sensor  transla¬ 
tion  vector  (T£s)  and  sensor-to-body  DCMs  (C(!)  for 
each  sensor  used  in  Run  2.  The  body  (6)  frame  is 
defined  as  shown  in  Figure  2.3. 


Tin 

C^ 

IMU 

0.0000 

0.0000 

0.0000 

1.0000  0.0000  0.0000 

0.0000  1.0000  0.0000 

0.0000  0.0000  1.0000 

Cam  1 

0.0318 

-0.1655 

-0.0088 

0.0000  0.0000  1.0000 

0.0000  1.0000  0.0000 

-1.0000  0.0000  0.0000 

Cam  2 

0.0165 

1.6025 

-0.0040 

0.0081  0.0019  1.0000 

-0.0017  1.0000  -0.0019 

-1.0000  -0.0017  0.0081 

Cam  3 

-0.4731 

-0.2259 

-0.0500 

-0.0075  0.9988  0.0494 

0.0068  0.0494  -0.9988 

-0.9999  -0.0072  -0.0071 

Table  3.7:  Data  collection  run  descriptions.  This  ta¬ 
bic  characterizes  the  data  collection  runs  used  during 
this  research. 


Run  1 

Run  2 

Location 

Parking  lot 

Urban  downtown 

Vehicle 

Push  cart 

Van 

Duration  [mm:ss] 

5:40 

9:00 

Total  distance  [m] 

172.04 

2.257  x  103 

Left  Camera 

125495 

107971 

Right  Camera 

122865 

102828 

Side  Camera 

101205 

115970 

Side  Camera  Direction 

Right 

Left 

IMU 

MIDG  II 

MIDG  II 

Truth  Source 

HG1700  SPAN 

HG1700  SPAN 
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IV.  Results  and  Analysis 


This  chapter  discusses  the  results  from  the  experiments  described  in  Section  3.2, 
which  were  used  to  evaluate  the  automatic  calibration  and  affine  distortion 
prediction  algorithms  developed  during  this  thesis.  The  methods  used  to  process  and 
analyze  the  collected  data  are  presented  in  Section  4.1.  Sections  4.2  and  4.3  present 
and  discuss  the  results  from  the  automatic  calibration  and  affine  distortion  prediction 
algorithms  respectively. 

4-1  Data  Processing 

The  calibration  and  navigation  data  collected  during  the  experiments  described 
in  Section  3.2  were  post  processed  using  The  Mathworks,  Inc.’s  Matlab  software,  ver¬ 
sion  R2010B.  Data  processing  included  image  formatting,  truth  trajectory  generation, 
accelerometer  and  gyro  bias  estimation  and  error  calculation. 

4-1.1  Image  Formatting.  Prior  to  processing  each  run,  the  collected  raw 
image  data  was  timestamped  within  ±  1  millisecond,  renamed,  and  organized  into 
respective  folders  in  order  to  decrease  the  computational  load  on  the  IAEKF.  Next, 
every  image  was  pre-rectihed  to  account  for  lens  distortions  (as  explained  in  Section 
3. 1.2. 6).  Finally,  SIFT  features  were  precomputed  for  every  rectified  image  in  order 
to  reduce  the  run-time  load  on  the  IAEKF. 

4-1.2  Error  Calculation  .  A  tactical-grade  SPAN  navigation  platform  from 
Novatel,  consisting  of  a  Honeywell  HG1700  inertial  sensor  and  a  GPS  receiver,  was 
used  to  collect  high-fidelity  position,  velocity,  and  attitude  data  during  both  runs. 
This  truth  data  was  preprocessed  to  generate  true  position,  velocity,  and  attitude 
states  in  order  to  verify  the  performance  of  the  IAEKF  output.  State  errors  will  be 
computed  by  subtracting  estimated  states  from  their  respective  true  states  in  Section 
4.3.2. 
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4- 1.3  Bias  Estimation.  During  the  beginning  of  each  run,  the  IAEKF  was  al¬ 
lowed  to  estimate  accelerometer  and  gyroscope  biases  by  observing  INS  measurements 
and  comparing  them  to  known  or  reference  states.  For  this  research,  the  high-fidelity 
truth  data  was  given  to  the  IAEKF  in  the  form  of  alignment  measurements  with  low 
covariance  for  a  short  duration.  In  real-world  applications,  this  process  can  be  also 
accomplished  by  allowing  the  IAEKF  to  run  while  the  vehicle  is  stationary.  In  order 
to  simulate  real-world  applications,  the  collection  rigs  remained  stationary  for  a  brief 
period  of  time  before  executing  the  run.  Run  1  (calibration  run)  was  aligned  for  two 
minutes,  while  Run  2  (demonstration  run)  was  aligned  for  30  seconds. 

4-2  Automatic  Camera  Calibration 

Since  there  was  no  practical  way  of  obtaining  truth  data  for  comparison,  the 
automatic  calibration  algorithm  was  tested  by  calibrating  a  single  binocular  pair  of 
consumer-grade  cameras  and  comparing  the  calibration  results  to  a  standard  calibra¬ 
tion  from  Bouguet’s  toolbox.  Instead  of  comparing  each  parameter,  the  calibration 
routines  were  compared  in  terms  of  average  reprojection  error,  which  represents  the 
effective  accuracy  of  the  entire  camera  model  produced.  As  shown  in  Table  4.1,  the 
automatic  calibration  algorithm  provided  a  70  %  increase  in  calibration  speed  (by 
eliminating  manual  calibration  steps)  and  averaged  4  times  more  calibration  points 
per  image  while  maintaining  sub-pixel  errors.  It  is  important  to  note  that,  unlike 
Bouguet’s  algorithm,  the  automatic  calibration  algorithm  did  not  require  the  entire 
calibration  board  to  be  visible,  which  allowed  the  calibration  board  to  be  held  closer 
to  the  binocular  camera  set.  Consequently,  the  calibration  board  covered  a  larger 
part  of  the  pix- frame  when  compared  to  standard  calibration  as  shown  in  Figure  4.1. 
Although  the  mean  reprojection  error  was  sub-pixel,  the  automatic  calibration  algo¬ 
rithm  exhibited  a  significant  increase  in  reprojection  error  spread  when  compared  to 
Bouget’s  toolbox  (illustrated  in  Figure  4.2).  This  increase  in  spread  is  most  likely 
due  to  the  lower  SIFT  feature  localization  and  matching  accuracy  when  compared  to 
the  corner  localization  accuracy  of  the  Harris  corner  detector.  Additionally,  since  the 
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Table  4.1:  Automatic  calibration  results.  This  table 
compares  the  calibration  results  from  an  automatic 
calibration  routine  and  a  standard  calibration  routine. 


Standard 

Automatic 

Calibration  time  [min] 

Number  of  images  used 
Approximate  points  per  image 
Average  reprojection  error  [pix] 

10 

10 

100 

[0.20  0.18] 

3 

7 

400 

[0.35  0.27] 

user  did  not  manually  dclinate  the  calibration  board,  the  algorithm’s  estimate  of  the 
board’s  pose  ([R  t])  could  have  been  degraded,  which  resulted  in  degraded  camera 
model  estimates  (A). 

4-3  Affine  Distortion  Prediction 

The  ADP  algorithm  was  tested  during  two  collection  runs,  using  similar  collec¬ 
tion  rigs.  The  calibration  run  was  collected  using  the  push-cart  rig,  and  its  primary 
purpose  was  to  collect  enough  data  to  properly  tune  the  IAEKF.  Having  tuned  the 
system  variables,  the  demonstration  run  which  was  collected  using  the  van-mounted 
rig,  was  used  to  verify  the  effects  of  ADP  while  varying  the  number  of  cameras  used. 

4-3.1  Trajectory  Comparisons.  As  mentioned  in  Section  4.1.2,  the  high- 
fidelity  truth  data  was  used  to  generate  “true”  trajectories  for  each  run.  In  this 
section,  the  IAEKF  filtered  output  trajectories  are  compared  to  the  truth  trajectories 
in  the  horizontal  (North-East  plane)  and  vertical  (Down)  dimensions. 

4-3. 1.1  Calibration  Run.  As  shown  in  Figures  4.3  and  4.4,  varying  the 
number  of  cameras  or  the  usage  of  ADP  did  not  significantly  alter  the  resulting  filter 
output  trajectory  for  the  calibration  run.  This  was  somewhat  expected  clue  to  its  short 
duration.  It  is  important,  however,  to  note  that  the  processing  profiles  including  the 
usage  of  3  cameras  (with  ADP  on  and  off)  did  result  in  a  Down  trajectory  closer  to  the 
known  terrain  variations  of  the  collection  location.  Since  the  data  collection  motion 
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profile  included  a  double  loop  in  the  horizontal  trajectory,  the  vertical  trajectory  was 
expected  to  exhibit  a  double  “dip”  which  was  not  present  in  the  reference  trajectory, 
but  approximated  by  the  IAEKF  output  trajectories  involving  3  cameras.  This  minor 
yet  important  discrepancy  points  at  an  underlying  problem  in  the  truth  data  for  this 
run.  Therefore,  the  truth  data  was  discarded  after  t  =  250  seconds.  Again,  since  the 
run  was  useful  in  providing  tuning  parameters,  this  partial  loss  of  truth  data  was  not 
catastrophic. 

4-3. 1.2  Demonstration  Run.  The  trajectory  comparisons  for  the 
demonstration  run  are  shown  in  Figures  4.5  and  4.6.  As  illustrated  by  the  figures, 
each  processing  profile  resulted  in  a  slightly  different  IAEKF  output  trajectory.  The 
differences  in  the  output  trajectories  are  most  noticeable  in  the  latter  part  of  the 
run  (as  expected),  especially  in  the  vertical  trajectory.  These  trajectory  comparisons 
indicate,  at  an  early  stage,  that  the  usage  of  ADP  resulted  in  filtered  trajectories 
closer  to  the  truth  in  both  the  two  and  three  camera  configurations.  This  is  especially 
noticeable  in  the  vertical  trajectory  comparison  (Figure  4.6)  and  could  be  attributed 
to  either  the  increase  of  positive  landmark  matches,  or  the  decrease  of  false  landmark 
matches  during  the  run.  Unfortunately,  the  veracity  of  individual  landmark  matches 
is  impractical  to  determine  since  it  would  require  careful  manual  supervision  of  the 
real-time  navigation  computations  for  the  run. 

4-3.2  Navigation  State  Errors.  In  this  section,  the  IAEKF  filtered  outputs 
for  the  nine  navigation  states  (3  position,  3  velocity,  3  attitude)  are  analyzed  in  terms 
of  error,  which  was  obtained  by  subtracting  the  corresponding  true  states  as  described 
in  Section  4.1.2. 

4-3. 2.1  Calibration  Run.  Table  Table  4.2  summarizes  the  Root  Sum 
Squared  (RSS)  errors  for  the  nine  Position,  Velocity,  and  Attitude  (PVA)  navigation 
states  in  the  calibration  run,  while  Figures  4.7  through  4.12  illustrate  the  absolute 
and  RSS  errors  for  each  navigation  state  as  functions  of  time.  Since  the  truth  data 


80 


Table  4.2:  PVA  RSS  errors  for  the  calibration  run. 
This  table  summarizes  the  mean  RSS  position,  veloc¬ 
ity,  and  attitude  errors  for  the  calibration  run  across 
all  four  processing  configurations.  The  4th  row  of  the 
table  displays  position  errors  as  a  percentage  of  the 
total  distance  traveled  (172  m). 


Cl 

C2 

C3 

C4 

Cameras 

2 

2 

3 

3 

ADP 

Off 

On 

Off 

On 

Position  [m] 

0.64 

0.64 

0.63 

0.63 

Position  [%] 

0.37 

0.37 

0.37 

0.37 

Velocity  [m/s] 

0.053 

0.053 

0.055 

0.055 

Attitude  [mrad] 

41.98 

41.98 

42.06 

42.06 

was  determined  to  be  invalid  after  t  =  250,  the  error  computations  have  been  limited 
to  exclude  times  greater  than  250  seconds. 

As  illustrated  by  Figures  4.7  through  4.12  and  foreshadowed  by  the  trajectory 
comparisons,  there  was  no  significant  difference  in  position,  velocity  or  attitude  errors 
for  the  calibration  run.  This  is  most  likely  due  to  the  short  duration  of  the  run.  The 
mean  position,  velocity,  and  attitude  RSS  errors  increased  slightly  for  the  process¬ 
ing  profiles  involving  three  cameras,  but  the  increase  amounts  were  not  statistically 
significant. 
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(a) 


(b) 

Figure  4.1:  Comparison  of  pix- frame  coverage  for  calibration,  (a)  Manual  Calibra¬ 
tion  (b)  Automatic  Calibration.  The  increased  lens  and  pix-frame  coverage,  enabled 
through  automatic  calibration,  explains  the  slight  increase  in  reprojection  error  when 
compared  to  Bouguet’s  toolbox. 


82 


(a) 


3 

2 

1 

0 

-1 

-2 

-3 

-4 


Figure  4.2:  Calibration  reprojection  error  illustration,  (a)  Manual  Calibration  (b) 
Automatic  Calibration.  Although  still  providing  sub-pixel  averages,  the  automatic 
calibration  routine  showed  a  slight  increase  in  average  reprojection  error  and  spread 
when  compared  to  Bouguet’s  toolbox. 
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Figure  4.3:  N-E  trajectory  comparison  plots  for  the  calibration  run. 
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HAE  (Down)  [m] 


Vertical  Trajectory  Comparison  (Local-Level  NED) 


Figure  4.4:  Down  trajectory  comparison  plots  for  the  calibration  run.  Note 
that  truth  data  is  invalid  after  t  =  250  s. 
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i  4.5:  N-E  trajectory  comparison  plots  for  the  demonstration  run. 


HAE  (Down)  [m] 


Vertical  Trajectory  Comparison  (Local-Level  NED) 


Figure  4.6:  Down  trajectory  comparison  plots  for  the  demonstration  run. 
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Figure  4.7:  Position  error  plots  for  the  calibration  run.  Note  that  ADP  ON 
and  ADP  OFF  profiles  overlap  for  both  2  and  3  camera  configurations. 


RSS  Error  (m) 


Position  RSS  Error  Comparison 


Figure  4.8:  Position  RSS  error  plots  for  the  calibration  run.  Note  that  ADP 
ON  and  ADP  OFF  profiles  overlap  for  both  2  and  3  camera  configurations. 


Figure  4.9:  Velocity  error  plots  for  the  calibration  run.  Note  that  ADP  ON 
and  ADP  OFF  profiles  overlap  for  both  2  and  3  camera  configurations. 
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RSS  Error  (m/s) 


Velocity  RSS  Error  Comparison 


Figure  4.10:  Velocity  RSS  error  plots  for  the  calibration  run.  Note  that  ADP 
ON  and  ADP  OFF  profiles  overlap  for  both  2  and  3  camera  configurations. 
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Figure  4.11:  Attitude  error  plots  for  the  calibration  run.  Note  that  ADP 
ON  and  ADP  OFF  profiles  overlap  for  both  2  and  3  camera  configurations. 
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Attitude  RSS  Error  Comparison 


Figure  4.12:  Attitude  RSS  error  plots  for  the  calibration  run.  Note  that 

ADP  ON  and  ADP  OFF  profiles  overlap  for  both  2  and  3  camera  configura¬ 
tions. 
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Table  4.3:  PVA  RSS  errors  for  the  demonstration 
run.  This  table  summarizes  the  mean  RSS  position, 
velocity,  and  attitude  errors  for  the  demonstration  run 
across  all  four  processing  configurations.  The  4th  row 
of  the  table  displays  position  errors  as  a  percentage  of 
the  total  distance  traveled  (2.26  km). 


Cl 

C2 

C3 

C4 

Cameras 

2 

2 

3 

3 

ADP 

Off 

On 

Off 

On 

Position  [m] 

46.25 

42.10 

52.92 

34.98 

Position  [%] 

2.05 

1.87 

2.34 

1.53 

Velocity  [m/s] 

1.16 

1.14 

1.30 

0.97 

Attitude  [mrad] 

129.89 

129.57 

163.74 

84.35 

4-3. 2. 2  Demonstration  Run.  Table  4.3  summarizes  the  RSS  errors  for 
the  nine  PVA  navigation  states  in  the  demonstration  run,  while  Figures  4.13  through 
4.18  illustrate  the  absolute  and  RSS  errors  for  each  navigation  state  over  time. 

For  the  demonstration  run,  the  processing  profile  selection  had  a  significant 
impact  on  navigation  errors.  In  an  absolute  sense,  the  position,  velocity,  and  attitude 
errors  for  this  run  were  higher  across  all  processing  profiles  when  compared  to  the 
calibration  run.  However,  it  is  important  to  note  that  the  demonstration  run  was 
approximately  1.6  times  longer  in  duration  and  13.1  times  larger  in  total  distance  when 
compared  to  the  calibration  run.  Since  the  IAEKF  algorithm  is  still  based  on  dead¬ 
reckoning,  its  errors  are  neither  stationary  nor  ergodic.  Therefore,  it  is  reasonable  to 
expect  that  long  runs  may  not  have  the  same  errors  as  short  runs. 

Analyzing  the  errors  in  the  demonstration  run  relative  to  each  other  confirms 
some  of  the  initial  trends  foreshadowed  in  the  trajectory  analysis,  but  also  reveals 
an  interesting  phenomenon.  Performing  further  analysis  on  the  numbers  provided 
by  Table  4.3  shows  that  using  the  ADP  algorithm  always  improved  performance 
across  all  navigation  states.  Interestingly  however,  using  a  three-camera  configuration 
without  ADP  produced  the  highest  errors  across  all  states  in  the  demonstration  run. 
In  contrast,  the  three-camera  configuration  with  ADP  resulted  in  the  lowest  errors 
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Figure  4.13:  Position  error  plots  for  the  demonstration  run. 

across  all  states.  These  results  showcase  of  the  effects  of  false  landmark  matches 
on  navigation  accuracy.  This  disparate  behavior  between  the  two  processing  profiles 
involving  three  cameras  hints  at  a  possible  increase  of  positive  matches,  decrease  of 
false  matches,  or  a  combination  thereof  when  using  ADP. 
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Figure  4.14:  Position  RSS  error  plots  for  the  demonstration  run. 
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Figure  4.15:  Velocity  error  plots  for  the  demonstration  run. 
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Velocity  RSS  Error  Comparison 


Figure  4.16:  Velocity  RSS  error  plots  for  the  demonstration  run. 
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Figure  4.17:  Attitude  error  plots  for  the  demonstration  run. 
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RSS  Error  (mrad) 


Attitude  RSS  Error  Comparison 


Figure  4.18:  Attitude  RSS  error  plots  for  the  demonstration  run. 
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4-3.3  Landmark  Tracking.  Since  ADP  is  designed  to  directly  improve  long- 
baseline  feature  tracking,  it  is  important  to  analyze  feature  matching  and  tracking 
performance  for  each  of  the  runs. 

4-3.3. 1  Calibi'ation  Run.  Tables  4.4  and  4.5  summarize  the  landmark 
match  and  track  performance  for  the  calibration  run.  As  expected,  the  number  of 
total  landmark  matches  is  significantly  higher  for  Camera  1  and  Camera  2  due  to 
their  relative  position  with  respect  to  the  vehicle.  Additionally,  since  all  landmarks 
are  always  initialized  by  the  front  camera  pair,  it  is  natural  for  those  cameras  to 
obtain  higher  match  counts.  Following  the  trend  set  by  the  trajectory  and  state 
error  comparisons  for  the  calibration  run,  there  was  no  major  statistical  difference 
in  the  total  landmark  match  counts  across  the  various  processing  profiles.  In  terms 
of  landmark  track  statistics,  the  minimum,  maximum,  mean,  and  standard  deviation 
for  landmark  track  duration  remained  statistically  unchanged  across  all  processing 
profiles.  Figures  4.19  through  4.21  illustrate  the  cumulative  landmark  match  counts 
over  time  for  each  of  the  three  cameras.  As  illustrated  by  the  figures,  all  processing 
profiles  resulted  in  similar  cumulative  landmark  match  counts  with  respect  to  time. 
Although  not  statistically  significant,  Figure  4.21  did  display  a  slightly  increased  slope 
on  the  number  of  landmark  matches  with  respect  to  time,  which  could  be  an  early 
indication  of  the  effects  of  ADP  on  long-baseline  landmark  tracking.  Additionally,  it 
is  important  to  note  that  the  matches  obtained  with  ADP  off  were  not  necessarily  the 
same  matches  obtained  with  ADP  on;  it  is  possible,  as  supported  by  the  state  error 
results,  that  enabling  ADP  decreased  false  matches  and  increased  positive  matches. 

4-3. 3. 2  Demonstration  Run.  Tables  4.6  and  4.7  summarize  the  land¬ 
mark  match  and  track  performance  for  the  demonstration  run.  The  landmark  match 
counts  for  the  front  camera  pair  were  very  similar  across  all  processing  profiles,  much 
like  in  the  calibration  run.  Analyzing  the  results  from  Camera  3  reveals  more  support¬ 
ing  evidence  for  improved  side-camera  match  performance  when  using  ADP.  There 
is  only  a  small  difference  between  the  two  final  landmark  match  counts  for  Cam- 
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Table  4.4:  Landmark  match  counts  for  the  calibra¬ 
tion  run  1.  This  table  summarizes  the  cumulative 
landmark  match  counts  for  the  calibration  run  1  across 
all  four  processing  configurations. 


Cl 

C2 

C3 

C4 

Cameras 

2 

2 

3 

3 

ADP 

Off 

On 

Off 

On 

Cam  1 

6.83  x  103 

6.83  x  103 

6.89  x  103 

6.87  x  103 

Cam  2 

6.92  x  103 

6.93  x  103 

6.94  x  103 

6.89  x  103 

Cam  3 

NA 

NA 

29 

30 

Table  4.5:  Landmark  track  statistics  the  calibration 
run.  This  table  summarizes  the  landmark  tracking 
duration  statistics  for  the  calibration  run  across  all 
four  processing  configurations. 


Cl 

C2 

C3 

C4 

Cameras 

2 

2 

3 

3 

ADP 

Off 

On 

Off 

On 

Min  [s] 

0 

0 

0 

0 

Max  [s] 

52.00 

52.00 

52.00 

52.00 

Mean  [s] 

4.52 

4.58 

4.58 

4.58 

STD  [s] 

7.41 

7.40 

7.43 

7.40 

era  3.  However,  this  information,  coupled  with  the  significant  improvement  across 
all  navigation  states  when  using  ADP,  points  at  bigger  underlying  improvement  in 
landmark  match  performance.  Thus,  the  data  is  not  intended  to  imply  that  the  Cam¬ 
era  3  matches  with  ADP  off  were  the  same  ones  found  with  ADP  on.  Rather,  while 
the  count  is  similar,  the  error  analysis  suggests  that  enabling  ADP  increased  match 
accuracy.  Figures  4.22  and  4.23  display  similar  landmark  match  performance  for  the 
front  camera  pair,  as  was  the  case  with  the  calibration  run.  Meanwhile,  Figure  4.24 
shows  a  significant  increase  in  cumulative  landmark  match  counts  for  Camera  3  with 
respect  to  time,  when  using  ADP.  Finally,  the  results  shown  in  Table  4.7  reveal  no 
significant  difference  in  minimum,  maximum,  mean  or  standard  deviation  in  landmark 
track  durations  across  the  four  processing  profiles.  This  is  yet  another  indicator  that 
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Camera  1  Track  Details 


using  ADP  may  not  be  simply  increasing  the  number  of  positive  matches,  but  more 
likely  performing  a  combination  of  false  match  reduction  and  positive  match  addition, 
which  resulted  in  a  small  net  positive  change  with  large  improvements  in  navigation 
accuracy. 

4.3.4  Computational  Load.  Although  this  research  was  not  aimed  at  opti¬ 
mizing  computations  for  real-time  implementations,  it  is  still  important  to  note  the 
computational  load  differences  between  all  four  processing  profiles.  Table  4.8  sum¬ 
marizes  the  computational  time  required  to  complete  each  run  using  the  various  pro¬ 
cessing  profiles.  As  expected,  the  processing  profiles  involving  three  cameras  required 
longer  computational  times  due  to  the  recursive  use  of  the  SIFT  algorithm.  Addi¬ 
tionally,  the  usage  of  ADP  increased  the  required  computational  time  non-linearly 
in  every  case.  It  is  important  to  note  that  even  with  the  fastest  processing  profile 
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Camera  2  Track  Details 


(two  camera,  no  ADP),  both  runs  required  significantly  more  time  than  the  actual 
duration  of  the  run.  Again,  this  research  did  not  aim  at  optimizing  these  algorithms 
for  real-time  implementation,  but  such  efforts  will  be  necessary  before  fully  fielding  a 
system  equipped  with  this  technology. 


4-4  Summary 

This  section  has  presented  the  results  and  analysis  from  the  experiments  de¬ 
scribed  in  Chapter  111.  The  automatic  calibration  algorithm  has  been  shown  to  not 
only  decrease  the  required  time  for  calibration,  but  also  vastly  reduce  the  necessary 
user  interaction  when  compared  to  the  standard  calibration  technique.  Finally,  the 
usage  of  affine  distortion  prediction  and  a  side-looking  camera  has  been  shown  to  sig¬ 
nificantly  improve  vehicle  position,  velocity,  and  attitude  estimates  by  the  IAEKF.  By 
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Camera  3  Track  Details 


understanding  the  nature  and  mathematics  behind  computer  feature  matching  tech¬ 
niques,  two  major  steps  in  the  image-aided  navigation  cycle  (calibration  and  feature 
tracking)  have  been  further  matured. 
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Table  4.6:  Landmark  match  counts  for  the  demon¬ 
stration  run.  This  table  summarizes  the  cumulative 
landmark  match  counts  for  the  demonstration  run 
across  all  four  processing  configurations. 


Cl 

C2 

C3 

C4 

Cameras 

2 

2 

3 

3 

ADP 

Off 

On 

Off 

On 

Cam  1 

2.06  x  104 

2.03  x  104 

2.05  x  104 

2.08  x  104 

Cam  2 

2.17  x  104 

2.12  x  104 

2.20  x  104 

2.15  x  104 

Cam  3 

NA 

NA 

27 

30 

Table  4.7:  Landmark  track  statistics  for  the  demon¬ 
stration  run.  This  table  summarizes  the  landmark 
tracking  duration  statistics  for  the  demonstration  run 
across  all  four  processing  configurations. 


Cl 

C2 

C3 

C4 

Cameras 

2 

2 

3 

3 

ADP 

Off 

On 

Off 

On 

Min  [s] 

0 

0 

0 

0 

Max  [s] 

41.50 

41.75 

41.75 

41.75 

Mean  [s] 

1.39 

1.39 

1.45 

1.42 

STD  [s] 

4.04 

3.98 

4.10 

4.06 

Table  4.8:  Processing  times  table.  This  table  sum¬ 
marizes  the  processing  times  required  to  complete  the 
two  collected  runs  across  all  four  processing  configu¬ 
rations.  All  times  are  in  [hh:mm:ss]. 


Cl 

C2 

C3 

C4 

Cameras 

2 

2 

3 

3 

ADP 

Off 

On 

Off 

On 

Calibration  Run 

00:06:44 

00:10:10 

00:18:55 

01:18:16 

Demonstration  Run 

00:48:10 

06:36:59 

01:24:49 

14:39:26 
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Cumulative  Features  Tracked 


Camera  1  Track  Details 


Figure  4.22:  Camera  1  landmark  matches  for  the  demonstration  run. 


107 


Cumulative  Features  Tracked 


Camera  2  Track  Details 


Figure  4.23:  Camera  2  landmark  matches  for  the  demonstration  run. 
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Cumulative  Features  Tracked 


Camera  3  Track  Details 


Figure  4.24:  Camera  3  landmark  matches  for  the  demonstration  run. 
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V.  Conclusions  and  Future  Work 


This  chapter  summarizes  the  information  presented  in  earlier  sections  of  this 
thesis  as  well  as  suggestions  for  continued  improvement  through  future  work. 

5.1  Conclusions 

As  the  Air  Force’s  dependency  on  precision  navigation  information  has  in¬ 
creased,  so  has  the  interest  in  overcoming  the  reliance  on  GPS.  Image-aided  naviga¬ 
tion,  or  the  coupling  of  inertial  and  imaging  sensors  has  grown  as  a  possible  alternate 
precision  navigation  technology  over  the  past  decade  and  continues  to  show  promis¬ 
ing  results.  This  research  has  focused  on  improving  two  key  steps  in  the  image-aided 
navigation  process:  camera  calibration  and  landmark  tracking.  Consequently,  the 
two  main  objectives  of  the  research  have  been  to  optimize  camera  calibration  through 
automation  and  improve  landmark  tracking  accuracy  through  affine  distortion  predic¬ 
tion.  Together,  these  two  computer  vision  techniques  allow  for  precision  navigation 
through  the  deep,  mutualistic  coupling  of  inertial  and  imaging  sensors. 

The  automatic  camera  calibration  algorithm  was  developed  from  an  existing 
manual  camera  calibration  toolbox  by  automating  specific  steps  which  previously  re¬ 
quired  a  human  in  the  loop.  Specifically,  the  standard  chess-pattern  calibration  board 
was  substituted  with  an  arbitrary  image  on  which  distinct  features  can  be  found.  Fea¬ 
ture  matches  between  the  physical  and  digital  copies  of  the  calibration  board  were 
automatically  found,  removing  the  need  for  the  manual  delineation  of  the  calibration 
board  perimeter,  as  was  required  by  the  original  toolbox.  Having  automatically  com¬ 
puted  the  necessary  correspondences  between  points  in  the  calibration  board  frame 
with  points  in  the  pixel  frame,  the  calibration  algorithm  was  finished  as  usual,  which 
resulted  in  a  completely  automatic  camera  calibration  toolbox.  The  automatic  camera 
calibration  algorithm  was  presented  at  the  7th  Annual  Dayton  Engineering  Sciences 
Symposium  [8]. 

A  novel  affine  distortion  prediction  algorithm  was  developed  in  order  to  counter¬ 
act  the  distortions  arising  from  changes  in  3-D  viewpoint,  which  significantly  feature 
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matching  accuracy.  Affine  distortions  on  2-D  images  were  simulated  by  assuming  all 
points  on  a  given  image  are  actually  three-dimensional  and  coplanar.  Starting  with 
the  image-aided  navigation  extended  Kalman  filter  built  by  Veth  [22]  as  a  baseline, 
the  affine  prediction  algorithm  was  used  to  simulate  affine  distortions  on  existing  im¬ 
ages,  based  on  the  relative  change  in  position  between  a  vehicle  and  the  landmarks 
it  tracked.  The  distorted  images  were  then  used  to  compute  new  feature  descrip¬ 
tors  that  better  approximated  those  found  in  new  images.  In  contrast  to  ASIFT  [15], 
where  all  possible  distortions  are  simulated  for  every  input  image,  the  affine  distortion 
prediction  algorithm  only  simulated  the  necessary  image  distortions  due  to  its  deep 
integration  into  the  EKF.  The  enhanced  IAEKF  algorithm  with  affine  distortion  pre¬ 
diction  was  presented  at  the  Position  Location  and  Navigation  System  Conference  [9]. 

An  experiment  was  developed  to  evaluate  the  performance  of  the  automatic 
calibration  and  affine  distortion  prediction  algorithms.  For  the  camera  calibration  al¬ 
gorithm,  a  binocular  set  of  cameras  were  calibrated  using  both  the  standard  (manual) 
toolbox  with  the  chess-pattern  board,  and  the  automatic  algorithm  with  the  modified 
board.  The  two  calibration  techniques  were  compared  in  terms  of  total  time  required, 
average  number  of  point  correspondences,  lens  coverage,  and  reprojection  errors  in 
the  resulting  camera  model. 

For  the  affine  distortion  prediction  algorithm,  two  outdoor  collection  runs  were 
performed  using  a  three-camera  setup,  augmented  with  a  MEMS-grade  INS  sensor, 
and  a  high-fidelity  truth  source.  The  first  run  was  collected  using  a  push-cart  around 
a  full  parking  lot  and  featured  mostly  left-hand  turns  with  a  right-facing  side  camera. 
The  second  run  was  collected  using  a  van  driving  around  a  downtown  urban  envi¬ 
ronment,  and  featured  mostly  right-hand  turns  with  a  left-facing  side  camera.  The 
two  data  collections  were  then  post-processed  through  the  extended  Kalman  filter, 
using  various  processing  profiles  in  which  the  number  of  cameras  and  the  usage  of 
affine  distortion  prediction  were  individually  varied.  The  Liter’s  results  were  then 
compared  with  the  solutions  from  the  high-fidelity  truth  source  for  each  run  in  order 
to  determine  their  accuracy. 
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The  automatic  calibration  algorithm  showed  a  70%  decrease  in  required  time 
when  compared  to  the  manual  calibration.  Additionally,  it  increased  the  average 
number  of  point  correspondences  used  for  calibration  from  1000  to  5600,  which  was 
directly  associated  with  increased  lens  coverage.  Although  the  average  pixel  reprojec¬ 
tion  error  was  slightly  higher  for  automatic  calibration,  it  was  still  sub-pixel  for  both 
the  x  and  y  axes.  The  significant  increase  in  reprojection  error  spread  was  attributed 
to  the  decreased  feature  localization  and  matching  accuracy  of  the  SIFT  algorithm 
when  compared  to  the  Harris  corner  detector. 

The  affine  distortion  prediction  algorithm  significantly  improved  the  accuracy 
of  all  nine  navigation  states  for  the  demonstration  run.  It  was  concluded  that  the 
calibration  run  was  too  short  to  show  any  significant  differences.  Further  investigation 
revealed  that  using  affine  distortion  prediction  resulted  in  more  accurate  results  due 
to  a  combination  of  reduction  in  false  matches  and  increase  in  positive  matches.  It 
was  also  determined  that  augmenting  the  standard  forward-looking  camera  pair  with 
a  side-looking  camera  without  affine  prediction  actually  increased  navigation  errors, 
possibly  due  to  false  landmark  matches.  Using  a  standard  image-aided  navigation 
setup  consisting  of  two  forward-looking  cameras  and  no  affine  distortion  prediction 
as  a  baseline,  the  augmented  configuration  with  a  side-looking  camera  and  affine 
distortion  prediction  provided  an  average  reduction  in  error  of  24%  in  position,  16% 
in  velocity  and  35%  in  attitude. 

5.2  Future  Work 

Both  of  the  algorithms  developed  in  this  thesis  could  be  improved  upon  in  the 
quest  for  a  reliable,  mature  alternate  precision  navigation  technology.  This  section 
presents  some  of  the  possible  areas  of  improvement  for  future  research. 

5.2.1  Multi-sensor  Extrinsic  Calibration.  The  automatic  calibration  algo¬ 
rithm  presented  in  this  thesis  computes  the  camera  model  for  each  individual  camera 
in  a  system,  as  well  as  the  relative  rotation  and  translation  between  each  camera  using 
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the  first  camera’s  frame  as  the  reference  frame.  However,  most  image-aided  naviga¬ 
tion  platforms  are  equipped  with  more  than  just  imaging  sensors.  At  a  minimum, 
image-aided  navigation  as  presented  herein  requires  the  use  of  an  inertial  sensor  in 
addition  to  the  imaging  sensors.  For  this  reason,  it  would  be  wise  to  further  develop 
the  automatic  camera  calibration  algorithm  presented  in  this  thesis  into  a  full- vehicle, 
multi-sensor  calibration  process,  which  provides  camera  models  as  well  as  the  relative 
rotation  and  translation  between  every  sensor.  A  vehicle  calibration  algorithm  that 
is  fully  automatic  would  certainly  propel  image-aided  navigation  into  a  new  level  of 
maturity  and  held  readiness. 

5.2.2  Calibration  Board  Elimination.  Although  the  automatic  calibration 
algorithm  presented  in  this  thesis  has  eliminated  the  need  for  user  interaction  during 
the  computation  phase,  a  human-in-the-loop  is  still  required  to  hold  a  calibration 
board  in  front  of  the  cameras  to  collect  the  calibration  images.  The  use  of  a  calibration 
board  makes  camera  calibration  relatively  simple,  but  adds  a  level  of  manual  work  that 
is  not  practical  for  held  operations.  If  image-aided  navigation  and  computer  vision 
technology  are  to  be  implemented  in  real-world  scenarios,  with  actual  navigation 
platforms,  the  automatic  camera  calibration  algorithm  should  be  further  developed  to 
eliminate  the  need  for  a  calibration  board.  The  calibration  board  could  be  eliminated 
by  using  known  features  within  the  held  of  view  of  the  cameras  while  still  maintaining 
the  automation  provided  by  the  SIFT  algorithm. 

5.2.3  Projective  Distoidion  Modeling.  The  affine  distortion  prediction  al¬ 
gorithm  signihcantly  improved  feature  matching  accuracy  by  computing  new  feature 
descriptors  on  digitally  distorted  images.  However,  the  affine  camera  model  on  which 
affine  distortion  prediction  is  based  is  not  exactly  representative  of  real-world  distor¬ 
tions.  In  reality,  features  undergo  a  projective  distortion  which  has  been  approximated 
by  the  affine  model.  Projective  models  include  the  effects  of  vanishing  points  created 
by  the  projection  of  light  into  the  focal  point  of  an  imaging  sensor.  Although  affine 
approximations  have  been  shown  to  provide  significant  improvements  in  feature  track- 
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ing,  it  would  be  insightful  to  investigate  the  improvements  (if  any)  provided  by  fully 
modeling  projective  distortions. 

5.2.4  Modular  Landmark  Tracking  Filter.  One  of  the  key  computational 
limitations  on  the  image-aided  navigation  algorithm  developed  by  Veth  [22]  is  the 
requirement  to  augment  the  extended  Kalman  filter  with  three  additional  states  for 
each  landmark  being  tracked.  For  this  reason,  the  maximum  number  of  landmarks 
tracked  by  any  camera  was  kept  to  30  (resulting  in  90  additional  states).  Taking  into 
consideration  that  the  average  image  contains  approximately  900  salient  features,  only 
3%  of  the  available  visual  information  is  being  used  for  navigation.  However,  setting 
the  maximum  number  of  features  to  even  200  would  require  the  use  of  600  additional 
states,  and  pose  a  heavy  computational  load.  Although  Veth’s  implementation  [22] 
provides  very  deep  coupling  between  the  feature  tracker  and  the  inertial  sensor,  it 
limits  (computationally)  the  maximum  number  of  features  that  can  be  tracked.  A 
potentially  more  computationally-friendly  approach  would  implement  feature  tracking 
without  requiring  additional  Kalman  filter  states  in  the  navigation  filter  by  adding  a 
parallel  filter  for  landmark  estimation  and  tracking.  This  modular  architecture  may 
improve  the  overall  system’s  ability  to  grow  or  shrink  the  number  of  landmarks  as 
well  as  add  or  remove  vision-aiding  altogether  without  significant  system  re-design. 
Such  an  architecture  may  pave  the  way  towards  modular,  all  source  navigation,  where 
users  may  select  a  suite  of  sensors  suited  for  their  particular  application  or  mission  [5]. 

5.2.5  Catalog-based  Landmark  Navigation.  Finally,  the  feature  tracker  im¬ 
plemented  in  the  current  image-aided  algorithm  provides  state  measurements  which 
are  relative  from  frame  to  frame.  That  is,  the  standard  feature  tracker  tells  the  ex¬ 
tended  Kalman  filter  how  the  vehicle  moved  from  one  epoch  to  the  next.  Consequently, 
there  is  a  significant  possibility  of  accumulating  navigation  errors,  especially  in  posi¬ 
tion  and  attitude,  because  there  are  no  absolute  measurements  being  provided.  One 
method  of  combating  such  incremental  error  accumulation  is  to  implement  a  catalog- 
based  feature  tracker  in  addition  to  the  standard  feature  tracker.  A  catalog-based 
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tracker  would  provide  absolute  position  measurements  based  on  known  landmark  lo¬ 
cations,  or  previously  tracked  landmarks  that  had  relatively  low  position  uncertainty. 


5.3  Closing 

This  research  has  presented  two  signal  processing  algorithms  designed  to  im¬ 
prove  the  image-aided  navigation  process.  As  this  technology  continues  to  mature, 
a  reliable,  equally-precise  alternative  to  navigation  via  GPS  may  be  achieved.  The 
necessary  processes  have  been  developed;  it  is  now  up  to  the  next  wave  of  scholars  to 
propel  this  theory  into  a  practical,  fielded  reality. 
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